Exploiting genomics in the search for new drugs

ABSTRACT

The cellular effects of potentially therapeutic compounds are characterized in mammalian cells and yeast. In the latter case the effects can be characterized on a genome-wide scale by monitoring changes in messenger RNA levels in treated cells with high-density oligonucleotide probe arrays.

BACKGROUND OF THE INVENTION

[0001] Many biological functions are accomplished by altering theexpression of various genes through transcriptional (e.g. throughcontrol of initiation, provision of RNA precursors, RNA processing,etc.) and/or translational control. For example, fundamental biologicalprocesses such as cell cycle, cell differentiation and cell death, areoften characterized by the variations in the expression levels of groupsof genes.

[0002] Changes in gene expression also are associated with pathogenesis.For example, the lack of sufficient expression of functional tumorsuppressor genes and/or the over expression of oncogene/protooncogenescould lead to tumorgenesis Marshall, Cell, 64: 313-326 (1991); Weinberg,Science, 254: 1138-1146 (1991), incorporated herein by reference for allpurposes). Thus, changes in the expression levels of particular genes(e.g. oncogenes or tumor suppressors) serve as signposts for thepresence and progression of various diseases.

[0003] Often drugs are screened and prescreened for the ability tointeract with a major target without regard to other effects the drugshave on cells. Often such other effects cause toxicity in the wholeanimal, which prevent the development and use of the potential drug.Therefore, there is a need in the art to develop a systematic approachto test and develop new drugs for their effects on cellular metabolismwithout relying on gross morphologic and phenotypic effects.

SUMMARY OF THE INVENTION

[0004] This invention provides methods and compositions for studying thecomplex relationships among drugs and genes. In some of its specificapplications, this invention provides methods and compositions fordetecting alternate targets for drug screening and development bymonitoring the expression of genes affected by a drug or mutation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005]FIG. 1. (FIG. 1A) Scheme for the combinatorial synthesis of2,6,9-trisubstituted purines from a 2-, 6-, or 9-linked purine scaffoldwith amination and alkylation chemistries. Chemical structures of CDKinhibitors (FIG. 1B) flavopiridol (FIG. 1C) olomoucine and roscovitine,and (FIG. 1D) purvalanol A and B and (FIG. 1E) 52 and 52Me.

[0006]FIG. 2. (FIG. 2A) Purvalanol B bound to CDK2 (black sticks,principal conformation only) is compared with bound (1) olomoucine(white sticks) and bound roscovitine (orange sticks), (2) boundflavopiridol (green sticks), and (3) bound ATP (yellow sticks). Thecomparisons are based on superposition of the C atoms of CDK2. Theligands are shown in ball-and-stick representation with carbon atomscolored white, nitrogen atoms colored blue, oxygen atoms colored red,phosphorous atoms colored violet, and the chlorine atom of purvalanolcolored green. (FIG. 2B) Schematic drawing of CDK2-purvalanol Binteractions. Protein side chain contacts are indicated by linesconnecting the respective residue box and interactions to main chainatoms are shown as lines to the specific main chain atoms. Van der Waalscontacts are indicated by thin dotted lines, and H bonds by dashedlines. For H bonds the distances between the nonhydrogen atoms areindicated in angstroms. W, water.

[0007]FIG. 3. Representative transcripts observed to change more thantwofold for triplicate hybridizations for each of two independentexperiments (except for cdc28-4, which represents triplicatehybridizations of RNA from a single experiment). (FIG. 3A) Names of thegenes whose mRNA levels change in common to 52 and flavopiridol (none ofthese transcripts changed significantly in the 52Me profile): YBR214w(similar to Schizosaccharomyces pombe protein moc1 involved in meiosisand mitosis); YGR108W (CLB1, G₂-M phase cyclin); YBL003c (HTA2,histone); YBL002w (HTB2, histone); YNL327W(EGT2, involved in timing ofcell separation); YLR286C*(CTS1, endochitinase); YJL157C* (FAR1,inhibitor of Cdc28p/Cln1,2p complexes); YPR119W* (CLB2, G₂-M phasecyclin); YHR096C (HXT5, homologous to hexose transporters); YAL061W(unknown, similar to alcohol or sorbitol dehydrogenase); YKR097W (PCK1,phosphoenol pyruvate carboxykinase); YGR043C (similar to TalIp, atransaldolase); YMR105C (PGM2, phosphoglucomutase); YBR169c (SSE2, heatshock protein of HSP70 family); YBR072W (HSP26, beat shock proteininduced by osmostress); YLL026W (HSP104); YCR021c(HSP30); YPL240C(HSP82, chaperonin homologous to Escherichia coli HtpG); YDR171W (HSP42,involved in restoration of cytoskeleton during mild stress); YOR328W(PDR10, member of the ATP binding cassette superfamily); YDR406w(PDR15); YDL223c (unknown); YER150w (similar to Sed1p an abundant cellsurface glycoprotein); YGR032W (GSC2, component of 1,3-glucan synthase);YGL179C* (serine-threonine kinase similar to Elm1p and Kin82p); YLR178C(TFS1, Cdc25-dependent nutrient and ammonia response cell cycleregulator); YNR009W (unknown); YFL031W (HAC1, basic leucine zipperprotein, activates unfolded-protein response pathway); and YHR143W(unknown). (FIG. 3B) Transcript changes that may result from Pho85pkinase inhibition observed in either the 52 or flavopiridol profiles:YOL001W (PHO80, a cyclin that associates with Pho85p); YGR233C (PHO81,inhibitory protein that associates with Pho80p or Pho85p); YFL014W(HSP12, heat shock protein); YHR071W (PCL5, cyclinlike and associateswith Pho85p); YGR088W (CTT1, cytosolic catalase T); YBR093c (PHO5,secreted acid phosphatase); YLL039c(UBI4, ubiquitin); YCL009c (PHO84,phosphate transporter); YML116W (PHO8, vacuolar alkaline phosphatase);YBR296c (homologous to a phosphate-repressible permease). (FIG. 3C)Transcripts that change for cdc28-4, cdc28-4 and 52, cdc28-4 andflavopiridol, and 52: YBR147W (unknown, has 7 potential transmembranedomains); YOL155C (unknown, similar to glucan 1,4-glucosidase); YJR127C(ZMS1, similar to Arp1p, an N-acetyltransferase); YKL109W (HAP4,transcriptional activator protein involved in activation of CCAATbox-containing genes); YBL015w (ACH1, acetyl-coenzyme A hydrolase);YPR160W (GPH1, glycogen phosphorylase); YAL039C (CYC3, cytochromechemelyase), YML116W (ATR1, member of major facilitator superfamily);YCL009C (ILV6, acetolactate synthase regulatory subunit); YDR281C(unknown); YGL121C (unknown); YKL071w (unknown, similar to bacterialprotein csgA); YLR311C (unknown); YER037w (unknown); YOR248W (unknown).*Names marked by an asterisk indicate open reading frames for which atleast one hybridization of the set indicated a slightly less thantwofold change in abundance.

DETAILED DESCRIPTION OF THE INVENTION

[0008] In addition to measuring the inhibitory effects of purinederivatives in kinase assays and assays of cell growth, their effects onthe mRNA levels of nearly all yeast genes were determined withhigh-density oligonucleotide expression arrays (17, 18). These arrays(19, 20) make it possible to measure quantitatively and in parallel mRNAlevels for a very large number of genes after any chemical,environmental, or genetic perturbation. Because purvalanol analogsinhibit both human and S. cerevisiae CDKs, transcript profiles wereobtained in yeast, where they can be measured on a genome-wide scale.

[0009] Compounds 52 and flavopiridol were profiled to examine theeffects of two structurally different Cdc28p active site inhibitors ongene expression. Compound 52Me was profiled as a control to determinewhich transcriptional changes result from treatment with a structurallysimilar compound with greatly diminished CDK activity. Yeast cultureswere grown to late logarithmic phase (15), treated with 25 μMconcentrations of the inhibitors for 2 hours, after which cellularpolyadenylated mRNA was isolated and converted to biotin-labeledcomplementary RNA (cRNA) (17, 18). The labeled cRNA was then hybridizedto a set of four arrays containing more than 260,000 25-nucleotideoligomers (20).

[0010] Out of more than 6200 genes monitored, 194 (3% oftranscripts), 2(0.03% of transcripts), and 132 (2% of transcripts) showed a greaterthan twofold change in transcript level when treated with 52, 52Me, orflavopiridol, respectively (21). Consistent with the diminished activityof 52Me both in vivo and in vitro, far fewer transcripts were affectedby compound 52Me than by the CDK inhibitors. Of the 63 transcripts thatchanged in response to both CDK inhibitors 52 and flavopiridol, onlynine were down-regulated, five of which (CLB1, CLB2, HTA2, HTB2, EGT2)were associated with cell cycle progression (FIG. 3A). The transcriptencoded by CLB1 (G2 cyclin, implicated in the transition into mitosis)showed a significant decrease, consistent with inhibition of theCdc28p-Clb1/2p kinase, which is involved in a positive feedback loopdriving CLB1/2 transcription (22). Similarly, CDK activity has beenimplicated in transcriptional regulation of histone genes including HTA2and HTB2 (23), and EGT2, a gene involved in the timing of cellseparation after cytokinesis.

[0011] Another set of genes that are clearly affected by both 52 andflavopiridol (but not by 52Me) are ones involved in phosphatemetabolism, consistent with the observed in vitro inhibition of Pho85p(FIG. 3B). Intracellular phosphate levels in yeast are monitored by asystem that relies on the Pho85p kinase complex to modulate the activityof a transcription factor or factors that regulate a variety of genes,including a secreted acid phosphatase (Pho5p) (24), genes involved inthe stress response (the heat shock protein HSP12 and ubiquitin UBI4),and genes involved in glycogen metabolism. Proteins whose transcriptlevels were observed to increase for 52 or flavopiridol that areconsistent with inhibition of the Pho85p kinase include Pho80p (whosetranscription is known to be repressed by active Pho85), Pho81p (anendogenous Pho85-Pho80 inhibitor), Pho84p (a phosphate permease), Pho5p,CTT1p, HSP12p, and UBI4 (25). Notably absent from this list is glycogensynthase (GSY2) (26), despite the large number of other glycogenmetabolism mRNAs that change. Dissecting thetranscriptional-consequences of Pho85 inhibition (27) is additionallycomplicated because Pho85p associates with a large number of othercyclins (for example, Pcl1p-Pcl8p) (Z) to yield complexes of unknownfunction that may also be subject to inhibition.

[0012] Compound 52 and flavopiridol also affect the transcript levels ofmany genes involved in cellular metabolism. For example, genes involvedin glycolysis (PFK26 and YAL061W, an alcohol dehydrogenase), the citricacid cycle (ALD4), glycogen metabolism (PGM2 and YPR184W, a putativedebranching enzyme), gluconeogenesis (PCK1), and a probable sugartransporter (HXT5), were induced. Other changes in transcript levelsthat were in common to both compounds and are likely to be associatedwith drug exposure include up-regulation of a number of genes encodingmembers of the ATP-binding cassette superfamily and other transportproteins (PDR10, PDR15), cell wall glycoproteins (YER150w), and cellwall proteins implicated in increased drug resistance (GSC2) (29); genesinvolved in vacuole endocytosis and regulation (YPT53, PMC1); andseveral heat shock genes (HSP26, HSP30, HSP82, HSP104, SSE2). Additionalgenes with changes in common to both compounds include a GTP- andATP-binding protein (YDL223c) that putatively binds microtubules,1-myo-inositol-1-phosphate synthase (INO1), and 40 genes of unknownfunction. Very few of the 52 and flavopiridol-inducible genes weresignificantly induced by 52Me, suggesting that many of the drug-sensingmechanisms may respond to signals associated with the function ratherthan the structure of the drug.

[0013] Although Cdc28p is the intended target of both 52 andflavopiridol, more than half of the mRNA changes that result fromexposure to the two compounds are distinct. For example, of the ˜50genes whose transcript levels were decreased at least threefold inresponse to 52, 14 were ribosomal proteins (including RPL4A, RPL26B,RPS24A). In contrast, no ribosomal protein transcript levels decreasedmore than threefold after treatment with flavopiridol. These resultssuggest that the two compounds may inhibit Cdc28p function (10) oraffect pathways involving Cdc28p kinase activity to different degrees.Alternatively, the differential effects of the two compounds may resultfrom different intracellular concentrations or from their effects onother cellular targets not specifically examined in vitro. Given therelatively large number of transcripts that are differentially affectedby these two CDK inhibitors, we examined the transcriptionalconsequences of a genetic mutation in the Cdc28p kinase. Because CDC28is an essential gene, the transcript profile of two cdc28temperature-sensitive alleles [cdc28-4 and cdc28-13 (30)] and theirisogenic wild-type strains were measured under permissive growthconditions (25° C.) in which the degree of growth inhibitionapproximates that observed at the concentrations used in the inhibitorprofile experiments (31). The mutation leading to a reduction in Cdc28pkinase activity in the cdc28-4 mutant under permissive growth conditions(32) might be expected to simulate the effects of chemical inhibition.

[0014] Approximately 100 mRNAs in the cdc28-4 strain exhibited more thantwofold inductions over the wild type (FIG. 3C). Only two of the cellcycle-associated genes (histones HTA1 and HTA2) that changed in responseto flavopiridol or 52 were affected in this mutant (33). Instead, aswith flavopiridol and 52, a number of metabolic genes involved inglycogen synthesis, the citric acid cycle, gluconeogenesis, and theglyoxylate cycle were induced (FIG. 3C). Consistent with these changesis the induction of the HAP4 transcription factor, which has beenimplicated in the regulation of many respiration genes (34).

[0015] Another class of transcripts induced in cdc28-4 were for genesinvolved in stress signaling (35), as well as heat shock elements,stress response elements, and members of the major facilitatorsuperfamily. Other transcripts that were also affected by CDC28 mutationand in the small-molecule experiments include virtually all of thetranscription factors and many of the metabolic, biosynthetic, andstress response genes as well as a set of unknown genes, some of whichmay be linked to cell cycle regulation. However, there were also anumber of genes in these functional categories that showed significantchanges only for the cdc28-4 mutant, including a protein withtransmembrane domains (YOL155C), metabolic genes (ACH1), and a varietyof proteins of unknown function. The transcriptional responses to thissingle point mutation in CDC28 can be interpreted as cellular responsesthat tend to mitigate the effects of this alteration. Completeinactivation of Cdc28p kinase activity, rather than the partialinhibition at 25° C., may result in more cell cycle-related transcriptchanges. However, a host of additional changes associated with cellcycle arrest and secondary consequences of heat shock (required toinduce arrest) are likely to appear as well, and these changes maycomplicate interpretation of the profile results.

[0016] Our current experimental design does not allow us to definitivelyidentify the primary target or targets of inhibition by flavopiridol or52. However, most of the genes that were commonly down-regulated by thetwo compounds are known to be involved in cell cycle progression and areaffected in a way that is consistent with inhibition of Cdc28p activity.The transcript profiles also show distinct and reproducible differencesin the effects of the two compounds despite their similar in vitroactivity. Profiles of this sort may prove useful in evaluating theselectivity of drug candidates and in identifying proteins whoseinhibition might specifically potentiate the effects of a primary drug.The lack of correspondence in the changes of mRNA transcript levelsresulting from chemical and genetic inactivation underscores theintrinsic differences in these methods for modulating biologicalfunction.

[0017] Given the large number of purine-dependent cellular processes,purine libraries may serve as a rich source of inhibitors for manydifferent protein targets. Indeed, purine analogs have been identifiedthat selectively inhibit JNK kinase and glycogen synthase kinase (36,37). By screening these libraries for their effects in whole-cellassays, it should be possible to search for compounds with a widevariety of activities (39). Both gene expression profiles anddifferential gene expression libraries should facilitate identificationand characterization of targets (39). These and other approaches togenerating selective inhibitors of different cellular processes shouldcomplement genetic methods in the study of cellular function.

[0018] Based on the results reported herein a number of differentcombination of oligonucleotide probes are determined to be useful fordrug screening and identification purposes. Thus different combinationsof probes can be used to test the effects that test compounds have ongene expression in cells. The cells may be mammalian, such as human orother eukaryote, such as yeast. Although yeast genes and cells areexemplified above, the human homologues are known in many cases. Becausethe functions of many of these genes are so essential for cells, theyare believed to be extremely conserved among species, especially amongeukaryotes.

[0019] The oligonucleotide probes can be used in any hybridizationassay, solution or solid phase. Preferably the assays are done on asolid phase. More preferably the probes are bound to a solid supportwhich is an array. Any number of probes can be used which specificallyhybridize to genes which are affected by at least one of: compound 52,flavopiridol, and a cdc28-4 mutation. The direction of the effect may beeither up- or down-regulation. The same direction of effect may becaused by all three agents, or any combination of ups and downs orno-effects.

[0020] As is known in the art, an oligonucleotide probe typicallycomprises at least 10 contiguous nucleotides of a gene sequence, andpreferably 11, 13, 15, 17, 21, 25 or 30 nucleotides. Probes aredesirably labeled with a moiety which is either radioactive,enzymatically detectable, antigenically detectable, or fluorometricallydetectable.

[0021] The sets of probes of the present invention which detect geneswhich are regulated by compound 52, flavopiridol, or a cdc28-4 mutationmay be present in larger groups of probes which are not so regulated.Preferably at least 10, 20, 40, 60, 80, 90 or 100% of the probes arethose which are so regulated. The size of the sets of probes may varygreatly. The sets of probes may comprise at least 2, 3, 5, 7, 9, 11, 20,or 30 probes. They may comprise not more than 10, 20, 30, 100, 1000, or10000. The upper and lower bounds of the set of probes are always chosenso that the set comprises at least 2 probes regulated as taught herein.

[0022] Drugs, according to the present invention, are any compoundswhich have an effect on a cell. The drug need not have any proventherapeutic benefit. They may be compounds being screened or furtherevaluated for their therapeutic benefits. The drugs may be smallmolecules, i.e., organic or inorganic chemicals. The drugs may bemacromolecules or biologicals, such as antibodies, ligands, proteins,nucleic acids, antisense molecules, cytokines, chemokines, ribozymes,etc.

[0023] A set typically refers to an identified grouping ofoligonucleotides that are put together in a common container or on acommon object. These may be on an array or in a kit together. They aretypically separated, either spatially on a solid support such as anarray, or in separate vessels, such as vials or tubes. According to thepresent invention, at least 5% of the oligonucleotides or probes in aset are portions of genes which are up-regulated or down-regulated bycompound 53, flavopiridol and/or a cdc28 mutant. Preferably more than10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95% of sucholigonucleotides or probes in the set represent genes which are soregulated. Most preferably the genes are those identified in FIG. 3 orTable 3.

[0024] According to the present invention one can compare thespecificity of drugs' effects by looking at the number oftranscriptional targets which the drugs have and comparing them. Morespecific drugs will have less transcriptional targets. Similar sets oftargets of two drugs indicates a similarity of effects. Transcriptionaltargets of a drug or drugs can be identified as a possible additionaldirect target for drug development. Similarly, the effects of mutationson transcriptional targets can be used to screen potential drugs. Drugscan be screened for the ability to simulate the transcriptional effectsof mutations, to counteract the transcriptional effects of a mutation,or to augment the transcriptional effects of a mutation.

[0025] Comparison of patterns of transcription can be done by a human orby a computer. Transcription data (hybridization data) can be enteredinto the computer and the patterns can be compared. Both differences andsimilarities are useful to indicate specificities and downstreameffected genes.

[0026] Downstream regulated genes of compound 52, flavopiridol, andcdc28 mutations, as identified herein, can be used in transcriptionalscreening methods as well as in protein screening methods. Immunologicaltechniques can be used to assess the expression of the protein productsof the identified genes. The products of the identified genes can beused directly in drug development programs to identify drugs whichinhibit or stimulate the products.

[0027] I. Definitions

[0028] Bind(s) substantially: “Bind(s) substantially” refers tocomplementary hybridization between a probe nucleic acid and a targetnucleic acid and embraces minor mismatches that can be accommodated byreducing the stringency of the hybridization media to achieve thedesired detection of the target polynucleotide sequence.

[0029] Background: The terms “background” or “background signalintensity” refer to hybridization signals resulting from non-specificbinding, or other interactions, between the labeled target nucleic acidsand components of the oligonucleotide array (e.g., the oligonucleotideprobes, control probes, the array substrate, etc.). Background signalsmay also be produced by intrinsic fluorescence of the array componentsthemselves. A single background signal can be calculated for the entirearray, or a different background signal may be calculated for eachtarget nucleic acid. In a preferred embodiment, background is calculatedas the average hybridization signal intensity for the lowest 5% to 10%of the probes in the array, or, where a different background signal iscalculated for each target gene, for the lowest 5% to 10% of the probesfor each gene. Of course, one of skill in the art will appreciate thatwhere the probes to a particular gene hybridize well and thus appear tobe specifically binding to a target sequence, they should not be used ina background signal calculation. Alternatively, background may becalculated as the average hybridization signal intensity produced byhybridization to probes that are not complementary to any sequence foundin the sample (e.g. probes directed to nucleic acids of the oppositesense or to genes not found in the sample such as bacterial genes wherethe sample is mammalian nucleic acids). Background can also becalculated as the average signal intensity produced by regions of thearray that lack any probes at all.

[0030] Hybridizing specifically to: The phrase “hybridizing specificallyto” refers to the binding, duplexing, or hybridizing of a moleculesubstantially to or only to a particular nucleotide sequence orsequences under stringent conditions when that sequence is present in acomplex mixture (e.g., total cellular) DNA or RNA.

[0031] Introns: noncoding DNA sequences which separate neighboringcoding regions. During gene transcription, introns, like exons, aretranscribed into RNA but are subsequently removed by RNA splicing.

[0032] Massive Parallel Screening: The phrase “massively parallelscreening” refers to the simultaneous screening of at least about 100,preferably about 1000, more preferably about 10,000 and most preferablyabout 1,000,000 different nucleic acid hybridizations.

[0033] Mismatch control: The term “mismatch control” or “mismatch probe”refer to a probe whose sequence is deliberately selected not to beperfectly complementary to a particular target sequence. For eachmismatch (MM) control in a high-density array there typically exists acorresponding perfect match (PM) probe that is perfectly complementaryto the same particular target sequence. The mismatch may comprise one ormore bases. While the mismatch(s) may be located anywhere in themismatch probe, terminal mismatches are less desirable as a terminalmismatch is less likely to prevent hybridization of the target sequence.In a particularly preferred embodiment, the mismatch is located at ornear the center of the probe such that the mismatch is most likely todestabilize the duplex with the target sequence under the testhybridization conditions.

[0034] mRNA or transcript: The term “mRNA” refers to transcripts of agene. Transcripts are RNA including, for example, mature messenger RNAready for translation, products of various stages of transcriptprocessing. Transcript processing may include splicing, editing anddegradation.

[0035] Nucleic Acid: The terms “nucleic acid” or “nucleic acid molecule”refer to a deoxyribonucleotide or ribonucleotide polymer in eithersingle-or double-stranded form, and unless otherwise limited, wouldencompass analogs of natural nucleotide that can function in a similarmanner as naturally occurring nucleotide. An oligo-nucleotide is asingle-stranded nucleic acid of 2 to n bases, where n may be greaterthan 500 to 1000. Nucleic acids may be cloned or synthesized using anytechnique known in the art. They may also include non-natually occurringnucleotide analogs, such as those which are modified to improvehybridization and peptide nucleic acids.

[0036] Nucleic acid encoding a regulatory molecule: The regulatorymolecule may be DNA, RNA or protein. Thus for example DNA sites whichbind protein or other nucleic acid molecules are included within theclass of regulatory molecules encoded by a nucleic acid.

[0037] Perfect match probe: The term “perfect match probe” refers to aprobe that has a sequence that is perfectly complementary to aparticular target sequence. The test probe is typically perfectlycomplementary to a portion (subsequence) of the target sequence. Theperfect match (PM) probe can be a “test probe”, a “normalizationcontrol” probe, an expression level control probe and the like. Aperfect match control or perfect match probe is, however, distinguishedfrom a “mismatch control” or “mismatch probe.”

[0038] Probe: As used herein a “probe” is defined as a nucleic acid,capable of binding to a target nucleic acid of complementary sequencethrough one or more types of chemical bonds, usually throughcomplementary base pairing, usually through hydrogen bond formation. Asused herein, a probe may include natural (i.e. A, G, U, C, or T) ormodified bases (7-deazaguanosine, inosine, etc.). In addition, the basesin probes may be joined by a linkage other than a phosphodiester bond,so long as it does not interfere with hybridization. Thus, probes may bepeptide nucleic acids in which the constituent bases are joined bypeptide bonds rather than phosphodiester linkages.

[0039] Target nucleic acid: The term “target nucleic acid” refers to anucleic acid (often derived from a biological sample), to which theprobe is designed to specifically hybridize. It is either the presenceor absence of the target nucleic acid that is to be detected, or theamount of the target nucleic acid that is to be quantified. The targetnucleic acid has a sequence that is complementary to the nucleic acidsequence of the corresponding probe directed to the target. The termtarget nucleic acid may refer to the specific subsequence of a largernucleic acid to which the probe is directed or to the overall sequence(e.g., gene or mRNA) whose expression level it is desired to detect. Thedifference in usage will be apparent from context.

[0040] Stringent conditions: The term “stringent conditions” refers toconditions under which a probe will hybridize to its target subsequence,but with only insubstantial hybridization to other sequences or to othersequences such that the difference may be identified. Stringentconditions are sequence-dependent and will be different in differentcircumstances. Longer sequences hybridize specifically at highertemperatures. Generally, stringent conditions are selected to be about5° C. lower than the thermal melting point (Tm) for the specificsequence at a defined ionic strength and pH.

[0041] Subsequence: “Subsequence” refers to a sequence of nucleic acidsthat comprise a part of a longer sequence of nucleic acids.

[0042] Thermal melting point (Tm): The Tm is the temperature, underdefined ionic strength, pH, and nucleic acid concentration, at which 50%of the probes complementary to the target sequence hybridize to thetarget sequence at equilibrium. As the target sequences are generallypresent in excess, at Tm, 50% of the probes are occupied atequilibrium). Typically, stringent conditions will be those in which thesalt concentration is at least about 0.01 to 1.0 M Na ion concentration(or other salts) at pH 7.0 to 8.3 and the temperature is at least about30 C. for short probes (e.g., 10 to 50 nucleotide) Stringent conditionsmay also be achieved with the addition of destabilizing agents such asformamide.

[0043] Quantifying; The term “quantifying” when used in the context ofquantifying transcription levels of a gene can refer to absolute or torelative quantification. Absolute quantification may be accomplished byinclusion of known concentration(s) of one or more target nucleic acids(e.g. control nucleic acids such as Bio B or with known amounts thetarget nucleic acids themselves) and referencing the hybridizationintensity of unknowns with the known target nucleic acids (e.g. throughgeneration of a standard curve). Alternatively, relative quantificationcan be accomplished by comparison of hybridization signals between twoor more genes, or between two or more treatments to quantify the changesin hybridization intensity and, by implication, transcription level.

[0044] Sequence identity: The “percentage of sequence identity” or“sequence identity” is determined by comparing two optimally alignedsequences or subsequences over a comparison window or span, wherein theportion of the polynucleotide sequence in the comparison window mayoptionally comprise additions or deletions (ie., gaps) as compared tothe reference sequence (which does not comprise additions or deletions)for optimal alignment of the two sequences. The percentage is calculatedby determining the number of positions at which the identical subunit(e.g. nucleic acid base or amino acid residue) occurs in both sequencesto yield the number of matched positions, dividing the number of matchedpositions by the total number of positions in the window of comparisonand multiplying the result by 100 to yield the percentage of sequenceidentity. Percentage sequence identity when calculated using theprograms GAP or BESTFIT (see below) is calculated using default gapweights.

[0045] Methods of alignment of sequences for comparison are well knownin the art. Optimal alignment of sequences for comparison may beconducted by the local homology algorithm of Smith and Waterman, Adv.Appl. Math. 2: 482 (1981), by the homology alignment algorithm ofNeedleman and Wunsch J. Mol. Biol. 48: 443 (1970), by the search forsimilarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms(including, but not limited to CLUSTAL in the PC/Gene program byIntelligenetics, Moutain View, Calif., GAP, BESTFIT, FASTA, and TFASTAin the Wisconsin Genetics Software Package, Genetics Computer Group(GCG), 575 Science Dr., Madison, Wis., USA), or by inspection. Inparticular, methods for aligning sequences using the CLUSTAL program arewell described by Higgins and Sharp in Gene, 73: 237-244 (1988) and inCABIOS 5: 151-153 (1989)).

[0046] This invention provides methods and compositions forinterrogating the genetic network and for studying the influence onexpression of candidate drugs and mutations. The methods involvequantifying the level of expression of a large number of genes. In somepreferred embodiments, a high density oligonucleotide array is used tohybridize with a target nucleic acid sample to detect the expressionlevel of a large number of genes, preferably more than 10, morepreferably more than 100, and most preferably more than 1000 genes.

[0047] Activity of a gene is reflected by the activity of itsproduct(s): the proteins or other molecules encoded by the gene. Thoseproduct molecules perform biological functions. Directly measuring theactivity of a gene product is, however, often difficult for certaingenes. Instead, the immunological activities or the amount of the finalproduct(s) or its peptide processing intermediates are determined as ameasurement of the gene activity. More frequently, the amount oractivity of intermediates, such as transcripts, RNA processingintermediates, or mature mRNAs are detected as a measurement of geneactivity.

[0048] In many cases, the form and function of the final product(s) of agene is unknown. In those cases, the activity of a gene is measuredconveniently by the amount or activity of transcript(s), RNA processingintermediate(s), mature mRNA(s) or its protein product(s) or functionalactivity of its protein product(s).

[0049] Any methods that measure the activity of a gene are useful for atleast some embodiments of this invention. For example, traditionalNorthern blotting and hybridization, nuclease protection, RT-PCR anddifferential display have been used for detecting gene activity. Thosemethods are useful for some embodiments of the invention. However, thisinvention is most useful in conjunction with methods for detecting theexpression of a large number of genes.

[0050] High density arrays are particularly useful for monitoring theexpression control at the transcriptional, RNA processing anddegradation level. The fabrication and application of high densityarrays in gene expression monitoring have been disclosed previously in,for example, WO 97/10365,. WO 92/10588, U.S. application Ser. No.08/772,376 filed Dec. 23, 1996; Ser. No. 08/529,115 filed on Sep. 15,1995; Ser. No. 08/168,904 filed Dec. 15, 1993; Ser. No. 07/624,114 filedon Dec. 6, 1990, Ser. No. 07/362,901 filed Jun. 7, 1990, allincorporated herein for all purposed by reference. In some embodimentusing high density arrays, high density oligonucleotide arrays aresynthesized using methods such as the Very Large Scale ImmobilizedPolymer Synthesis (VLSIPS) disclosed in U.S. Pat. No. 5,445,934incorporated herein for all purposes by reference. Each oligonucleotideoccupies a known location on a substrate. A nucleic acid target sampleis hybridized with a high density array of oligonucleotides and then theamount of target nucleic acids hybridized to each probe in the array isquantified. One preferred quantifying method is to use confocalmicroscope and fluorescent labels. The GeneChip® system (Affymetrix,Santa Clara, Calif.) is particularly suitable for quantifying thehybridization; however, it will be apparent to those of skill in the artthat any similar systems or other effectively equivalent detectionmethods can also be used.

[0051] High density arrays are suitable for quantifying a smallvariations in expression levels of a gene in the presence of a largepopulation of heterogeneous nucleic acids. Such high density arrays canbe fabricated either by de novo synthesis on a substrate or by spottingor transporting nucleic acid sequences onto specific locations ofsubstrate. Nucleic acids are purified and/or isolated from biologicalmaterials, such as a bacterial plasmid containing a cloned segment ofsequence of interest. Suitable nucleic acids are also produced byamplification of templates. As a nonlimiting illustration, polymerasechain reaction, and/or in vitro transcription, are suitable nucleic acidamplification methods.

[0052] Synthesized oligonucleotide arrays are particularly preferred forthis invention. Oligonucleotide arrays have numerous advantages, asopposed to other methods, such as efficiency of production, reducedintra- and inter-array variability, increased information content andhigh signal-to-noise ratio.

[0053] Preferred high density arrays for gene function identificationand genetic network mapping comprise greater than about 100, preferablygreater than about 1000, more preferably greater than about 16,000 andmost preferably-greater than 65,000 or 250,000 or even greater thanabout 1,000,000 different oligonucleotide probes, preferably in lessthan 1 cm² of surface area. The oligonucleotide probes range from about5 to about 50 or about 500 nucleotides, more preferably from about 10 toabout 40 nucleotide and most preferably from about 15 to about 40nucleotides in length.

[0054] Massive Parallel Gene Expression Monitoring

[0055] One preferred method for massive parallel gene expressionmonitoring is based upon high density nucleic acid arrays. Nucleic acidarray methods for monitoring gene expression are disclosed and discussedin detail in PCT Application WO 092,10588 (published on Jun. 25, 1992),all incorporated herein by reference for all purposes.

[0056] Generally those methods of monitoring gene expression involve (a)providing a pool of target nucleic acids comprising RNA transcript(s) ofone or more target gene(s), or nucleic acids derived from the RNAtranscript(s); (b) hybridizing the nucleic acid sample to a high densityarray of probes and (c) detecting the hybridized nucleic acids andcalculating a relative and/or absolute expression (transcription, RNAprocessing or degradation) level.

[0057] (A) Providing a Nucleic Acid Sample

[0058] One of skill in the art will appreciate that it is desirable tohave nucleic samples containing target nucleic acid sequences thatreflect the transcripts of interest. Therefore, suitable nucleic acidsamples may contain transcripts of interest. Suitable nucleic acidsamples, however, may contain nucleic acids derived from the transcriptsof interest. As used herein, a nucleic acid derived from a transcriptrefers to a nucleic acid for whose synthesis the mRNA transcript or asubsequence thereof has ultimately served as a template. Thus, a cDNAreverse transcribed from a transcript, an RNA transcribed from thatcDNA, a DNA amplified from the cDNA, an RNA transcribed from theamplified DNA, etc., are all derived from the transcript and detectionof such derived products is indicative of the presence and/or abundanceof the original transcript in a sample. Thus, suitable samples include,but are not limited to, transcripts of the gene or genes, cDNA reversetranscribed from the transcript, cRNA transcribed from the cDNA, DNAamplified from the genes, RNA transcribed from amplified DNA, and thelike.

[0059] Transcripts, as used herein, may include, but not limited topre-mRNA nascent transcript(s), transcript processing intermediates,mature mRNA(s) and degradation products. It is not necessary to monitorall types of transcripts to practice this invention For example, one maychoose to practice the invention to measure the mature mRNA levels only.

[0060] In one embodiment, such sample is a homogenate of cells ortissues or other biological samples. Preferably, such sample is a totalRNA preparation of a biological sample. More preferably in someembodiments, such a nucleic acid sample is the total mRNA isolated froma biological sample. Those of skill in the art will appreciate that thetotal mRNA prepared with most methods includes not only the mature mRNA,but also the RNA processing intermediates and nascent pre-mRNAtranscripts. For example, total mRNA purified with a poly (dT) columncontains RNA molecules with poly (A) tails. Those polyA⁺ RNA moleculescould be mature mRNA, RNA processing intermediates, nascent transcriptsor degradation intermediates.

[0061] Biological samples may be of any biological tissue or fluid orcells from any organism. Frequently the sample will be a “clinicalsample” which is a sample derived from a patient. Clinical samplesprovide a rich source of information regarding the various states ofgenetic network or gene expression. Some embodiments of the inventionare employed to detect mutations and to identify the phenotype ofmutations. Such embodiments have extensive applications in clinicaldiagnostics and clinical studies. Typical clinical samples include, butare not limited to, sputum, blood, blood cells (eg., white cells),tissue or fine needle biopsy samples, urine, peritoneal fluid, andpleural fluid, or cells therefrom. Biological samples may also includesections of tissues, such as frozen sections or formalin fixed sectionstaken for histological purposes.

[0062] Another typical source of biological samples are cell cultureswhere gene expression states can be manipulated to explore therelationship among genes. In one aspect of the invention, methods areprovided to generate biological samples reflecting a wide variety ofstates of the genetic network.

[0063] One of skill in the art would appreciate that it is desirable toinhibit or destroy RNase present in homogenates before homogenates canbe used for hybridization. Methods of inhibiting or destroying nucleasesare well known in the art. In some preferred embodiments, cells ortissues are homogenized in the presence of chaotropic agents to inhibitnuclease. In some other embodiments, RNase is inhibited or destroyed byheat treatment followed by proteinase treatment.

[0064] Methods of isolating total mRNA are also well known to those ofskill in the art. For example, methods of isolation and purification ofnucleic acids are described in detail in Chapter 3 of LaboratoryTechniques in Biochemistry and Molecular Biology: Hybridization WithNucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, P.Tijssen, ed. Elsevier, N.Y. (1993) and Chapter 3 of LaboratoryTechniques in Biochemistry and Molecular Biology: Hybridization WithNucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, P.Tijssen, ed. Elsevier, N.Y. (1993)).

[0065] In a preferred embodiment, the total RNA is isolated from a givensample using, for example, an acid guanidinium-phenol-chloroformextraction method and polyA⁺ mRNA is isolated by oligo(dT) columnchromatography or by using (dT) on magnetic beads (see, e.g., Sambrooket al., Molecular Cloning: A Laboratory Mammal (2nd ed.), Vols. 1-3,Cold Spring Harbor Laboratory, (1989), or Current Protocols in MolecularBiology, F. Ausubel et al., ed. Greene Publishing andWiley-Interscience, N.Y. (1987)).

[0066] Frequently, it is desirable to amplify the nucleic acid sampleprior to hybridization. One of skill in the art will appreciate thatwhatever amplification method is used, if a quantitative result isdesired, care must be taken to use a method that maintains or controlsfor the relative frequencies of the amplified nucleic acids to achievequantitative amplification.

[0067] Methods of “quantitative” amplification are well known to thoseof skill in the art. For example, quantitative PCR involvessimultaneously co-amplifying a known quantity of a control sequenceusing the same primers. This provides an internal standard that may beused to calibrate the PCR reaction. The high density array may theninclude probes specific to the internal standard for quantification ofthe amplified nucleic acid.

[0068] One preferred internal standard is a synthetic AW106 cRNA. TheAW106 cRNA is combined with RNA isolated from the sample according tostandard techniques known to those of skilled in the art The RNA is thenreverse transcribed using a reverse transcriptase to provide copy DNA.The cDNA sequences are then amplified (e.g., by PCR) using labeledprimers. The amplification products are separated, typically byelectrophoresis, and the amount of radioactivity (proportional to theamount of amplified product) is determined. The amount of mRNA in thesample is then calculated by comparison with the signal produced by theknown AW106 RNA standard. Detailed protocols for quantitative PCR areprovided in PCR Protocols, A Guide to Methods and Applications, Innis etal., Academic Press, Inc. N.Y., (1990).

[0069] Other suitable amplification methods include, but are not limitedto polymerase chain reaction (PCR) (Innis, et al., PCR Protocols. Aguide to Methods and Application. Academic Press, Inc. San Diego,(1990)), ligase chain reaction (LCR) (see Wu and Wallace, Genomics, 4:560 (1989), Landegren, et al., Science, 241: 1077 (1988) and Barringer,et al., Gene, 89: 117 (1990), transcription amplification (Kwoh, et al.,Proc. Natl. Acad Sci. USA, 86: 1173 (1989)), and self-sustained sequencereplication (Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87: 1874(1990)).

[0070] Cell lysates or tissue homogenates often contain a number ofinhibitors of polymerase activity. Therefore, RT-PCR typicallyincorporates preliminary steps to isolate total RNA or mRNA forsubsequent use as an amplification template. A one-tube mRNA capturemethod may be used to prepare poly(A)⁺ RNA samples suitable forimmediate RT-PCR in the same tube (Boehringer Mannheim). The capturedmRNA can be directly subjected to RT-PCR by adding a reversetranscription mix and, subsequently, a PCR mix.

[0071] In a particularly preferred embodiment, the sample mRNA isreverse transcribed with a reverse transcriptase and a primer consistingof oligo(dT) and a sequence encoding the phage T7 promoter to providesingle stranded DNA template. The second DNA strand is polymerized usinga DNA polymerase. After synthesis of double-stranded cDNA, T7 RNApolymerase is added and RNA is transcribed from the cDNA template.Successive rounds of transcription from each single cDNA templateresults in amplified RNA. Methods of in vitro polymerization are wellknown to those of skill in the art (see, e.g., Sambrook, Supra.) andthis particular method is described in detail by Van Gelder, et al.,Proc. Natl. Acad. Sci. USA, 87: 1663-1667. (1990) who demonstrate thatin vitro amplification according to this method preserves the relativefrequencies of the various RNA transcripts. Moreover, Eberwine et al.Proc. Natl. Acad. Sci. USA, 89: 3010-3014 provide a protocol that usestwo rounds of amplification via in vitro transcription to achievegreater than 10⁶ fold amplification of the original starting material,thereby permitting expression monitoring even where biological samplesare limited.

[0072] It will be appreciated by one of skill in the art that the directtranscription method described above provides an antisense (aRNA) pool.Where antisense RNA is used as the target nucleic acid, theoligonucleotide probes provided in the array are chosen to becomplementary to subsequences of the antisense nucleic acids.Conversely, where the target nucleic acid pool is a pool of sensenucleic acids, the oligonucleotide probes are selected to becomplementary to subsequences of the sense nucleic acids. Finally, wherethe nucleic acid pool is double stranded, the probes may be of eithersense as the target nucleic acids include both sense and antisensestrands.

[0073] The protocols cited above include methods of generating pools ofeither sense or antisense nucleic-acids. Indeed, one approach can beused to generate either sense or antisense nucleic acids as desired. Forexample, the cDNA can be directionally cloned into a vector (e.g.,Stratagene's p Bluscript II KS (+) phagemid) such that it is flanked bythe T3 and T7 promoters. In vitro transcription with the T3 polymerasewill produce RNA of one sense (the sense depending on the orientation ofthe insert), while in vitro transcription with the T7 polymerase willproduce RNA having the opposite sense. Other suitable cloning systemsinclude phage lambda vectors designed for Cre-loxP plasmid subcloning(see e.g., Palazzolo et al, Gene, 88: 25-36 (1990)).

[0074] (B) Hybridizing Nucleic Acids to High Density Arrays

[0075] 1. Probe Design

[0076] One of skill in the art will appreciate that an enormous numberof array designs are suitable for the practice of this invention. Thehigh density array will typically include a number of probes thatspecifically hybridize to the sequences of interest. In addition, in apreferred embodiment, the array will include one or more control probes.

[0077] The high density array chip includes “test probes.” Test probescould be oligonucleotides that range from about 5 to about 45 or 5 toabout 500 nucleotides, more preferably from about 10 to about 40nucleotides and most preferably from about 15 to about 40 nucleotides inlength. In other particularly preferred embodiments the probes are 20 or25 nucleotides in length. In another preferred embodiments, test probesare double or single strand DNA sequences. DNA sequences are isolated orcloned from nature sources or amplified from nature sources using naturenucleic acid as templates. These probes have sequences complementary toparticular subsequences of the genes whose expression they are designedto detect. Thus, the test probes are capable of specifically hybridizingto the target nucleic acid they are to detect.

[0078] In addition to test probes that bind the target nucleic acid(s)of interest, the high density array can contain a number of controlprobes. The control probes fall into three categories referred to hereinas 1) normalization controls; 2) expression level controls; and 3)mismatch controls.

[0079] Normalization controls are oligonucleotide or other nucleic acidprobes that are complementary to labeled reference oligonucleotides orother nucleic acid sequences that are added to the nucleic acid sample.The signals obtained from the normalization controls after hybridizationprovide a control for variations in hybridization conditions, labelintensity,. “reading” efficiency and other factors that may cause thesignal of a perfect hybridization to vary between arrays. In a preferredembodiment, signals (e.g., fluorescence intensity) read from all otherprobes in the array are divided by the signal (e.g., fluorescenceintensity) from the control probes thereby normalizing the measurements.

[0080] Virtually any probe may serve as a normalization control.However, it is recognized that hybridization efficiency varies with basecomposition and probe length. Preferred normalization probes areselected to reflect the average length of the other probes present inthe array, however, they can be selected to cover a range of lengths.The normalization control(s) can also be selected to reflect the(average) base composition of the other probes in the array, however ina preferred embodiment, only one or a few normalization probes are usedand they are selected such that they hybridize well (i.e. no secondarystructure) and do not match any target-specific probes.

[0081] Expression level controls are probes that hybridize specificallywith constitutively expressed genes in the biological sample. Virtuallyany constitutively expressed gene provides a suitable target forexpression level controls. Typically expression level control probeshave sequences complementary to subsequences of constitutively expressed“housekeeping genes” including, but not limited to the β-actin gene, thetransferrin receptor gene, the GAPDH gene, and the like.

[0082] Mismatch controls may also be provided for the probes to thetarget genes, for expression level controls or for normalizationcontrols. Mismatch controls are oligonucleotide probes or other nucleicacid probes identical to their corresponding test or control probesexcept for the presence of one or more mismatched bases. A mismatchedbase is a base selected so that it is not complementary to thecorresponding base in the target sequence to which the probe wouldotherwise specifically hybridize. One or more mismatches are selectedsuch that under appropriate hybridization conditions (e.g. stringentconditions) the test or control probe would be expected to hybridizewith its target sequence, but the mismatch probe would not hybridize (orwould hybridize to a significantly lesser extent). Preferred mismatchprobes contain a central mismatch. Thus, for example, where a probe is a20 mer, a corresponding mismatch probe will have the identical sequenceexcept for a single base mismatch (e.g., substituting a G, a C or a Tfor an A) at any of positions 6 through 14 (the central mismatch).

[0083] Mismatch probes thus provide a control for non-specific bindingor cross-hybridization to a nucleic acid in the sample other than thetarget to which the probe is directed. Mismatch probes thus indicatewhether a hybridization is specific or not. For example, if the targetis present the perfect match probes should be consistently brighter thanthe mismatch probes. In addition, if all central mismatches are present,the mismatch probes can be used to detect a mutation. The difference inintensity between the perfect match and the mismatch probe (I(PM)-I(M))provides a good measure of the concentration of the hybridized material.

[0084] The high density array may also include samplepreparation/amplification control probes. These are probes that arecomplementary to subsequences of control genes selected because they donot normally occur in the nucleic acids of the particular biologicalsample being assayed. Suitable sample preparation/amplification controlprobes include, for example, probes to bacterial genes (e.g., Bio B)where the sample in question is a biological from a eukaryote.

[0085] The RNA sample is then spiked with a known amount of the nucleicacid to which the sample preparation/amplification control probe isdirected before processing. Quantification of the hybridization of thesample preparation/amplification control probe then provides a measureof alteration in the abundance of the nucleic acids caused by processingsteps (e.g. PCR, reverse transcription, in vitro transcription, etc.).

[0086] In a preferred embodiment, oligonucleotide probes in the highdensity array are selected to bind specifically to the nucleic acidtarget to which they are directed with minimal non-specific binding orcross-hybridization under the particular hybridization conditionsutilized. Because the high density arrays of this invention can containin excess of 1,000,000 different probes, it is possible to provide everyprobe of a characteristic length that binds to a particular nucleic acidsequence. Thus, for example, the high density array can contain everypossible 20-mer sequence complementary to an IL-2 mRNA.

[0087] However, there may exist 20-mer subsequences that are not uniqueto the IL-2 mRNA. Probes directed to these subsequences are expected tocross-hybridize with occurrences of their complementary sequence inother regions of the sample genome. Similarly, other probes simply maynot hybridize effectively under the hybridization conditions (e.g., dueto secondary structure, or interactions with the substrate or otherprobes). Thus, in a preferred embodiment, the probes that show such poorspecificity or hybridization efficiency are identified and may not beincluded either in the high density array itself (e.g., duringfabrication of the array) or in the post-hybridization data analysis.

[0088] In addition, in a preferred embodiment, expression monitoringarrays are used to identify the presence and expression (transcription)level of genes which are several hundred base pairs long. For mostapplications it would be useful to identify the presence, absence, orexpression level of several thousand to one hundred thousand genes.Because the number of oligonucleotides per array is limited in apreferred embodiment, it is desired to include only a limited set ofprobes specific to each gene whose expression is to be detected.

[0089] As disclosed in U.S. application Ser. No. 08/772,376, probes asshort as 15, 20, or 25 nucleotide are sufficient to hybridize to asubsequence of a gene and that, for most genes, there is a set of probesthat performs well across a wide range of target nucleic acidconcentrations. In a preferred embodiment, it is desirable to choose apreferred or “optimum” subset of probes for each gene beforesynthesizing the high density array.

[0090] 2. Forming High Density Arrays.

[0091] Methods of forming high density arrays of oligonucleotides,peptides and other polymer sequences with a minimal number of syntheticsteps are known. The oligonucleotide analogue array can be synthesizedon a solid substrate by a variety of methods, including, but not limitedto, light-directed chemical coupling, and mechanically directedcoupling. See Pirrung et al., U.S. Pat. No. 5,143,854 (see also PCTApplication No. WO 90/15070) and Fodor et al., PCT Publication Nos. WO92/10092 and WO 93/09668 and U.S. Ser. No. 07/980,523 which disclosemethods of forming vast arrays of peptides, oligonucleotides and othermolecules using, for example, light-directed synthesis techniques Seealso, Fodor et al., Science, 251, 767-77 (1991). These procedures forsynthesis of polymer arrays are now referred to as VLSIPS™ proceduresUsing the VLSIPS™ approach, one heterogeneous array of polymers isconverted, through simultaneous coupling at a number of reaction sites,into a different heterogeneous array. See, U.S. application Ser. Nos.07/796,243 and 07/980,523.

[0092] The development of VLSIPS™ technology as described in theabove-noted U.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO90115070 and 92/10092, is considered pioneering technology in the fieldsof combinatorial synthesis and screening of combinatorial libraries.More recently, patent application Ser. No. 08/082,937, filed Jun. 25,1993, describes methods for making arrays of oligonucleotide probes thatcan be used to check or determine a partial or complete sequence of atarget nucleic acid and to detect the presence of a nucleic acidcontaining a specific oligonucleotide sequence.

[0093] In brief, the light-directed combinatorial synthesis ofoligonucleotide arrays on a glass surface proceeds using automatedphosphoramidite chemistry and chip masking techniques. In one specificimplementation, a glass surface is derivatized with a silane reagentcontaining a functional group, e.g., a hydroxyl or amine group blockedby a photolabile protecting group. Photolysis through a photolithogaphicmask is used selectively to expose functional groups which are thenready to react with incoming 5′-photoprotected nucleosidephosphoramidites. The phosphoramidites react only with those sites whichare illuminated (and thus exposed by removal of the photolabile blockinggroup). Thus, the phosphoramidites only add to those areas selectivelyexposed from the preceding step. These steps are repeated until thedesired array of sequences have been synthesized on the solid surface.Combinatorial synthesis of different oligonucleotide analogues atdifferent locations on the array is determined by the pattern ofillumination during synthesis and the order of addition of couplingreagents.

[0094] In the event that an oligonucleotide analogue with a polyamidebackbone is used in the VLSIPS™ procedure, it is generally inappropriateto use phosphoramidite chemistry to perform the synthetic steps, sincethe monomers do not attach to one another via a phosphate linkage.Instead, peptide synthetic methods are substituted. See, e.g., Pirrunget. al. U.S. Pat. No. 5,143,854.

[0095] Peptide nucleic acids are commercially available from, e.g.,Biosearch, Inc. (Bedford, Mass.) which comprise a polyamide backbone andthe bases found in naturally occurring nucleosides. Peptide nucleicacids are capable of binding to nucleic acids with high specificity, andare considered “oligonucleotide analogues” for purposes of thisdisclosure.

[0096] In addition to the foregoing, additional methods which can beused to generate an array of oligonucleotides on a single substrate aredescribed in co-pending applications Ser. Nos. 07/980,523, filed Nov.20, 1992, and 07/796,243, filed Nov. 22, 1991 and in PCT Publication No.WO 93/09668. In the methods disclosed in these applications, reagentsare delivered to the substrate by either (1) flowing within a channeldefined on predefined regions or (2) “spotting” on predefined regions or(3) through the use of photoresist. However, other approaches, as wellas combinations of spotting and flowing, may be employed. In eachinstance, certain activated regions of the substrate are mechanicallyseparated from other regions when the monomer solutions are delivered tothe various reaction sites.

[0097] A typical “flow channel” method applied to the compounds andlibraries of the present invention can generally be described asfollows. Diverse polymer sequences are synthesized at selected regionsof a substrate or solid support by forming flow channels on a surface ofthe substrate through which appropriate reagents flow or in whichappropriate reagents are placed. For example, assume a monomer “A” is tobe bound to the substrate in a first group of selected regions. Ifnecessary, all or part of the surface of the substrate in all or a partof the selected regions is activated for binding by, for example,flowing appropriate reagents through all or some of the channels, or bywashing the entire substrate with appropriate reagents. After placementof a channel block on the surface of the substrate, a reagent having themonomer A flows through or is placed in all or some of the channel(s).The channels provide fluid contact to the first selected regions,thereby binding the monomer A on the substrate directly or indirectly(via a spacer) in the first selected regions.

[0098] Thereafter, a monomer B is coupled to second selected regions,some of which may be included among the first selected regions. Thesecond selected regions will be in fluid contact with a second flowchannel(s) through translation, rotation, or replacement of the channelblock on the surface of the substrate; through opening or closing aselected valve; or through deposition of a layer of chemical orphotoresist. If necessary, a step is performed for activating at leastthe second regions. Thereafter, the monomer B is flowed through orplaced in the second flow channel(s), binding monomer B at the secondselected locations. In this particular example, the resulting sequencesbound to the substrate at this stage of processing will be, for example,A, B, and AB. The process is repeated to form a vast array of sequencesof desired length at known locations on the substrate.

[0099] After the substrate is activated, monomer A can be flowed throughsome of the channels, monomer B can be flowed through other channels, amonomer C can be flowed through still other channels, etc. In thismanner, many or all of the reaction regions are reacted with a monomerbefore the channel block must be moved or the substrate must be washedand/or reactivated. By making use of many or all of the availablereaction regions simultaneously, the number of washing and activationsteps can be minimized.

[0100] One of skill in the art will recognize that there are alternativemethods of forming channels or otherwise protecting a portion of thesurface of the substrate. For example, according to some embodiments, aprotective coating such as a hydrophilic or hydrophobic coating(depending upon the nature of the solvent) is utilized over portions ofthe substrate to be protected, sometimes in combination with materialsthat facilitate wetting by the reactant solution in other regions. Inthis manner, the flowing solutions are further prevented from passingoutside of their designated flow paths.

[0101] High density nucleic acid arrays can be fabricated by depositingpresynthezied or natural nucleic acids in predined positions.Synthesized or natural nucleic acids are deposited on specific locationsof a substrate by light directed. targeting and oligonucleotide directedtargeting. Nucleic acids can also be directed to specific locations inmuch the same manner as the flow channel methods. For example, a nucleicacid A can be delivered to and coupled with a first group of reactionregions which have been appropriately activated. Thereafter, a nucleicacid B can be delivered to and reacted with a second group of activatedreaction regions. Nucleic acids are deposited in selected regions.Another embodiment uses a dispenser that moves from region to region todeposit nucleic acids in specific spots. Typical dispensers include amicropipette or capillary pin to deliver nucleic acid to the substrateand a robotic system to control the position of the micropipette withrespect to the substrate. In other embodiments, the dispenser includes aseries of tubes, a manifold, an array of pipettes or capillary pins, orthe like so that various reagents can be delivered to the reactionregions simultaneously.

[0102] 3. Hybridization

[0103] Nucleic acid hybridization simply involves contacting a probe andtarget nucleic acid under conditions where the probe and itscomplementary target can form stable hybrid duplexes throughcomplementary base pairing. The nucleic acids that do not form hybridduplexes are then washed away leaving the hybridized nucleic acids to bedetected, typically through detection of an attached detectable label.It is generally recognized that nucleic acids are denatured byincreasing the temperature or decreasing the salt concentration of thebuffer containing the nucleic acids. Under low stringency conditions(e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA:DNA,RNA:RNA, or RNA:DNA) will form even where the annealed sequences are notperfectly complementary. Thus specificity of hybridization is reduced atlower stringency. Conversely, at higher stringency (e.g., highertemperature or lower salt) successful hybridization requires fewermismatches.

[0104] One of skill in the art will appreciate that hybridizationconditions may be selected to provide any degree of stringency. In apreferred embodiment, hybridization is performed at low stringency inthis case in 6×SSPE-T at 37 C. (0.005% Triton X-100) to ensurehybridization and then subsequent washes are performed at higherstringency (e.g., 1×SSPE-T at 37 C.) to eliminate mismatched hybridduplexes. Successive washes may be performed at increasingly higherstringency (e.g., down to as low as 0.25×SSPE-T at 37 C. to 50 C.) untila desired level of hybridization specificity is obtained. Stringency canalso be increased by addition of agents such as formamide. Hybridizationspecificity may be evaluated by comparison of hybridization to the testprobes with hybridization to the various controls that can be present(e.g., expression level control, normalization control, mismatchcontrols, etc.).

[0105] In general, there is a tradeoff between hybridization specificity(stringency) and signal intensity. Thus, in a preferred embodiment, thewash is performed at the highest stringency that produces consistentresults and that provides a signal intensity greater than approximately10% of the background intensity. Thus, in a preferred embodiment, thehybridized array may be washed at successively higher stringencysolutions and read between each wash. Analysis of the data sets thusproduced will reveal a wash stringency above which the hybridizationpattern is not appreciably altered and which provides adequate signalfor the particular oligonucleotide probes of interest.

[0106] In a preferred embodiment, background signal is reduced by theuse of a detergent (e.g., C-TAB) or a blocking reagent (e.g., sperm DNA,cot-1 DNA, etc.) during the hybridization to reduce non-specificbinding. In a particularly preferred embodiment, the hybridization isperformed in the presence of about 0.5 mg/ml DNA (e.g., herring spermDNA). The use of blocking agents in hybridization is well known to thoseof skill in the art (see, e.g., Chapter 8 in P. Tijssen, supra.)

[0107] The stability of duplexes formed between RNAs or DNAs aregenerally in the order of RNA:RNA>RNA:DNA>DNA:DNA, in solution. Longprobes have better duplex stability with a target, but poorer mismatchdiscrimination than shorter probes (mismatch discrimination refers tothe measured hybridization signal ratio between a perfect match probeand a single base mismatch probe). Shorter probes (e.g., 8-mers)discriminate mismatches very well, but the overall duplex stability islow.

[0108] Altering the thermal stability (T_(m)) of the duplex formedbetween the target and the probe using, e.g., known oligonucleotideanalogues allows for optimization of duplex stability and mismatchdiscrimination. One useful aspect of altering the T_(m) arises from thefact that adenine-thymine (A-T) duplexes have a lower T_(m) thanguanine-cytosine (G-C) duplexes, due in part to the fact that the A-Tduplexes have 2 hydrogen bonds per base-pair, while the G-C duplexeshave 3 hydrogen bonds per base pair. In heterogeneous oligonucleotidearrays in which there is a non-uniform distribution of bases, it is notgenerally possible to optimize hybridization for each oligonucleotideprobe simultaneously. Thus, in some embodiments, it is desirable toselectively destabilize G-C duplexes and/or to increase the stability ofA-T duplexes. This can be accomplished, e.g., by substituting guanineresidues in the probes of an array which form G-C duplexes withhypoxanthine, or by substituting adenine residues in probes which formA-T duplexes with 2,6 diaminopurine or by using the salt tetramethylammonium chloride (TMACl) in place of NaCl.

[0109] Altered duplex stability conferred by using oligonucleotideanalogue probes can be ascertained by following, e.g., fluorescencesignal intensity of oligonucleotide analogue arrays hybridized with atarget oligonucleotide over time. The data allow optimization ofspecific hybridization conditions at, e.g., room temperature (forsimplified diagnostic applications in the future).

[0110] Another way of verifying altered duplex stability is by followingthe signal intensity generated upon hybridization with time. Previousexperiments using DNA targets and DNA chips have shown that signalintensity increases with time, and that the more stable duplexesgenerate higher signal intensities faster than less stable duplexes. Thesignals reach a plateau or “saturate” after a certain amount of time dueto all of the binding sites becoming occupied. These data allow foroptimization of hybridization, and determination of the best conditionsat a specified temperature.

[0111] Methods of optimizing hybridization conditions are well known tothose of skill in the art (see, e.g., Laboratory Techniques inBiochemistry and Molecular Biology, Vol. 24: Hybridization With NucleicAcid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)).

[0112] (C) Signal Detection

[0113] In a preferred embodiment, the hybridized nucleic acids aredetected by detecting one or more labels attached to the sample nucleicacids. The labels may be incorporated by any of a number of means wellknown to those of skill in the art. However, in a preferred embodiment,the label is simultaneously incorporated during the amplification stepin the preparation of the sample nucleic acids. Thus, for example,polymerase chain reaction (PCR) with labeled primers or labelednucleotides will provide a labeled amplification product. In a preferredembodiment, transcription amplification, as described above, using alabeled nucleotide (e.g. fluorescein-labeled UTP and/or CTP)incorporates a label into the transcribed nucleic acids.

[0114] Alternatively, a label may be added directly to the originalnucleic acid sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to theamplification product after the amplification is completed. Means ofattaching labels to nucleic acids are well known to those of skill inthe art and include, for example nick translation or end-labeling (e.g.with a labeled RNA) by kinasing of the nucleic acid and subsequentattachment (ligation) of a nucleic acid linker joining the samplenucleic acid to a label (e.g., a fluorophore).

[0115] Detectable labels suitable for use in the present inventioninclude any composition detectable by spectroscopic, photochemical,biochemical, immunochemical, electrical, optical or chemical means.Useful labels in the present invention include biotin for staining withlabeled streptavidin conjugate, magnetic beads (e.g., Dynabeads™),fluorescent dyes (e.g., fluorescein, texas red, rhodamine, greenfluorescent protein, and the like), radiolabels (e.g., ³H, ¹²5I, ³5S,¹⁴C, or ³2P), enzymes (e.g., horse radish peroxidase, alkalinephosphatase and others commonly used in an ELISA), and colorimetriclabels such as colloidal gold or colored glass or plastic (e.g.,polystyrene, polypropylene, latex, etc.). beads. Patents teaching theuse of such labels include U.S. Pat. Nos. 3,817,837; 3,850,752;3,939,350; 3,996,345; 4,277,437; 4,275,149; and 4,366,241.

[0116] Means of detecting such labels are well known to those of skillin the art. Thus, for example, radiolabels may be detected usingphotographic film or scintillation counters, fluorescent markers may bedetected using a photodetector to detect emitted light. Enzymatic labelsare typically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and colorimetric labels are detected by simplyvisualizing the colored label. One particular preferred methods usescolloidal gold label that can be detected by measuring scattered light.

[0117] The label may be added to the target (sample) nucleic acid(s)prior to, or after the hybridization. So called “direct labels” aredetectable labels that are directly attached to or incorporated into thetarget (sample) nucleic acid prior to hybridization. In contrast, socalled “indirect labels” are joined to the hybrid duplex afterhybridization. Often, the indirect label is attached to a binding moietythat has been attached to the target nucleic acid prior to thehybridization. Thus, for example, the target nucleic acid may bebiotinylated before the hybridization. After hybridization, anaviden-conjugated fluorophore will bind the biotin bearing hybridduplexes providing a label that is easily detected. For a detailedreview of methods of labeling nucleic acids and detecting labeledhybridized nucleic acids see Laboratory Techniques in Biochemistry andMolecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P.Tijssen, ed. Elsevier, N.Y., (1993)).

[0118] Fluorescent labels are preferred and easily added during an invitro transcription reaction. In a preferred embodiment, fluoresceinlabeled UTP and CTP are incorporated into the RNA produced in an invitro transcription reaction as described above.

[0119] Means of detecting labeled target (sample) nucleic acidshybridized to the probes of the high density array are known to those ofskill in the art. Thus, for example, where a colorimetric label is used,simple visualization of the label is sufficient. Where a radioactivelabeled probe is used, detection of the radiation (e.g. withphotographic film or a solid state detector) is sufficient.

[0120] In a preferred embodiment, however, the target nucleic acids arelabeled with a fluorescent label and the localization of the label onthe probe array is accomplished with fluorescent microscopy. Thehybridized array is excited with a light source at the excitationwavelength of the particular fluorescent label and the resultingfluorescence at the emission wavelength is detected In a particularlypreferred embodiment, the excitation light source is a laser appropriatefor the excitation of the fluorescent label.

[0121] The confocal microscope may be automated with acomputer-controlled stage to automatically scan the entire high densityarray. Similarly, the microscope may be equipped with a phototransducer(e.g., a photomultiplier, a solid state array, a CCD camera, etc.)attached to an automated data acquisition system to automatically recordthe fluorescence signal produced by hybridization to eacholigonucleotide probe on the array. Such automated systems are describedat length in U.S. Pat. No: 5,143,854, PCT Application 20 92/10092, andcopending U.S. application Ser. No. 08/195,889 filed on Feb. 10, 1994.Use of laser illumination in conjunction with automated confocalmicroscopy for signal detection permits detection at a resolution ofbetter than about 100 μm, more preferably better than about 50 μm, andmost preferably better than about 25 μm.

[0122] One of skill in the art will appreciate that methods forevaluating the hybridization results vary with the nature of thespecific probe nucleic acids used as well as the controls provided. Inthe simplest embodiment, simple quantification of the fluorescenceintensity for each probe is determined. This is accomplished simply bymeasuring probe signal strength at each location (representing adifferent probe) on the high density array (e.g., where the label is afluorescent label, detection of the amount of florescence (intensity)produced by a fixed excitation illumination at each location on thearray). Comparison of the absolute intensities of an array hybridized tonucleic acids from a “test” sample with intensities produced by a“control” sample provides a measure of the relative expression of thenucleic acids that hybridize to each of the probes.

[0123] One of skill in the art, however, will appreciate thathybridization signals will vary in strength with efficiency ofhybridization, the amount of label on the sample nucleic acid and theamount of the particular nucleic acid in the sample. Typically nucleicacids present at very low levels (e.g., <1 pM) will show a very weaksignal. At some low level of concentration, the signal becomes virtuallyindistinguishable from background. In evaluating the hybridization data,a threshold intensity value may be selected below which a signal is notcounted as being essentially indistinguishable from background.

[0124] Where it is desirable to detect nucleic acids expressed at lowerlevels, a lower threshold is chosen. Conversely, where only highexpression levels are to be evaluated a higher threshold level isselected. In a preferred embodiment, a suitable threshold is about 10%above that of the average background signal.

[0125] In addition, the provision of appropriate controls permits a moredetailed analysis that controls for variations in hybridizationconditions, cell health, non-specific binding and the like. Thus, forexample, in a preferred embodiment, the hybridization array is providedwith normalization controls. These normalization controls are probescomplementary to control sequences added in a known concentration to thesample. Where the overall hybridization conditions are poor, thenormalization controls will show a smaller signal reflecting reducedhybridization. Conversely, where hybridization conditions are good, thenormalization controls will provide a higher signal reflecting theimproved hybridization. Normalization of the signal derived from otherprobes in the array to the normalization controls thus provides acontrol for variations in hybridization conditions. Typically,normalization is accomplished by dividing the measured signal from theother probes in the array by the average signal produced by thenormalization controls. Normalization may also include correction forvariations due to sample preparation and amplification. Suchnormalization may be accomplished by dividing the measured signal by theaverage signal from the sample preparation/amplification control probes(e.g., the Bio B probes). The resulting values may be multiplied by aconstant value to scale the results.

[0126] As indicated above, the high density array can include mismatchcontrols. In a preferred embodiment, there is a mismatch control havinga central mismatch for every probe (except the normalization controls)in the array. It is expected that after washing in stringent conditions,where a perfect match would be expected to hybridize to the probe, butnot to the mismatch, the signal from the mismatch controls should onlyreflect non-specific binding or the presence in the sample of a nucleicacid that hybridizes with the mismatch. Where both the probe in questionand its corresponding mismatch control both show high signals, or themismatch shows a higher signal than its corresponding test probe, thereis a problem with the hybridization and the signal from those probes isignored. The difference in hybridization signal intensity between thetarget specific probe and its corresponding mismatch control is ameasure of the discrimination of the target-specific probe. Thus, in apreferred embodiment, the signal of the mismatch probe is subtractedfrom the signal from its corresponding test probe to provide a measureof the signal due to specific binding of the test probe.

[0127] The concentration of a particular sequence can then be determinedby measuring the signal intensity of each of the probes that bindspecifically to that gene and normalizing to the normalization controls.Where the signal from the probes is greater than the mismatch, themismatch is subtracted. Where the mismatch intensity is equal to orgreater than its corresponding test probe, the signal is ignored. Theexpression level of a particular gene can then be scored by the numberof positive signals (either absolute or above a threshold value), theintensity of the positive signals (either absolute or above a selectedthreshold value), or a combination of both metrics (e.g., a weightedaverage).

[0128] In some preferred embodiments, a computer system is used tocompare the hybridization intensities of the perfect match and mismatchprobes of each pair. If the gene is expressed, the hybridizationintensity (or affinity) of a perfect match probe of a pair should berecognizably higher than the corresponding mismatch probe. Generally, ifthe hybridizations intensities of a pair of probes are substantially thesame, it may indicate the gene is not expressed. However, thedetermination is not based on a single pair of probes, the determinationof whether a gene is expressed is based on an analysis of many pairs ofprobes.

[0129] After the system compares the hybridization intensity of theperfect match and mismatch probes, the system indicates expression ofthe gene. As an example, the system may indicate to a user that the geneis either present (expressed), marginal or absent (unexpressed).Specific procedures for data analysis is disclosed in U.S applicationSer. No. 08/772,376, previously incorporated for all purposes.

[0130] In addition to high density nucleic acid arrays, other methodsare also useful for massive gene expression monitoring. Differentialdisplay, described by Liang, P. and Pardee, A. B. (Differential Displayof eukaryotic messenger RNA by means of the polymerase chain reaction.Science 257:967-971, 1992, incorporated herein by reference for allpurposes) provides a useful mean for distinguishing gene expressionbetween two samples. Serial analysis of gene expression, described byVelculescu et al. (Serial Analysis of Gene Expression. Science,270:484-487, 1995, incorporated herein by reference for all purposes)provides another method for quantative and qualitative analysis of geneexpression. Optical fiber oligonucleotide sensors, described by Fergusonet al. (A Fiber-optic DNA biosensor microarray for the analysis of geneexpression. Nature-Biotechnology 14:1681-1684, 1996), can also be usedfor gene expression monitoring.

[0131] It is understood that the examples and embodiments describedherein are for illustrative purposes only and that various modificationsor changes in light thereof will be suggested to persons skilled in theart and are to be included within the spirit and purview of thisapplication and scope of the appended claims. All publications, patents,and patent applications cited herein are hereby incorporated byreference for all purposes.

EXAMPLES

[0132] Biomedical research has been aided tremendously by threedevelopments: (i) the ability to generate small molecule libraries usingcombinatorial chemistry methods coupled with high-throughput screening,(ii) the enormous increase in the number of newly identified genesequences from a host of different organisms, and (iii) the use ofstructural methods for the detailed characterization of ligand-proteininteraction sites that can be exploited for ligand design. Here weapplied these methods to the synthesis and characterization of potent,selective inhibitors of protein kinases involved in cell cycle control.The central role that cyclin-dependent kinases (CDKs) play in the timingof cell division and the high incidence of genetic alteration of CDKs orderegulation of CDK inhibitors in a number of cancers make CDKs apromising target for the design of selective inhibitors. Our approach toinhibiting CDKs has been to block the adenosine triphosphate(ATP)-binding site with compounds derived from combinatorial librariesof 2,6,9-trisubstituted purines. This strategy was motivated by thebinding mode of the purine olomoucine, which exhibits good selectivitybut only moderate inhibition [IC₅₀ (50% kinase inhibition)=7 μM] of asubset ofthe CDK family of protein kinases (1). The orientation of thepurine ring of olomoucine within the ATP-binding site of CDK2 is rotatedalmost 160° relative to that of the adenosine ring of ATP. Thus, itseemed that the introduction of new substituents at the 2, 6, and 9positions of the purine ring, rather than substituents appended to theribose, as is normally done, might lead to enhanced binding affinity andselectivity. A combinatorial approach to modifying the purine scaffoldcould be valuable in the search for potent and selective inhibitors ofvarious cellular processes because of the ubiquitous occurrence ofenzymes that use purines, including the estimated 2000 kinases encodedin the human genome.

[0133] To examine the effects of a range of diverse substituents on thepurine ring, we synthesized combinatorial libraries in which the 2, 6,and 9 positions could be varied starting with a 2-fluoro-6-chloropurinefraniework (FIG. 1A) (2, 3). Substitution chemistry was used to installamines at the 2 and 6 positions, and a Mitsunobu reaction (4, 5) wasused to alkylate the N9 position of the purine core. The substitutionchemistry allows introduction of primary and secondary amines bearing awide range of functional groups, whereas the Mitsunobu reactiontolerates primary and secondary alcohols lacking additional acidichydrogens. Newly appended groups can be modified combinatorially insubsequent steps with a variety of chemistries including acylation,reductive amination, and Suzuki coupling reactions (6). During librarysynthesis, one position is held invariant to allow attachment to thesolid support. Libraries are synthesized in a spatially separated formatwith either a pin apparatus (7) or polystyrene resin and screened forkinase inhibitors with a 96-well, solution-phase phosphorylation assayFIG. 1. (A) Scheme for the combinatorial synthesis of2,6,9-trisubstituted purines from a 2-, 6-, or 9-linked purine scaffoldwith amination and alkylation chemistries. Chemical structures of CDKinhibitors (B) flavopiridol (C) olomoucine and roscovitine, and (D)purvalanol A and B and (E) 52 and 52Me.

[0134] Several purine libraries in which the 2, 6, and 9 substituentswere varied separately were iteratively synthesized and screened. Weidentified aA number of 3- and 4-substituted benzylamine and anilinesubstituents that lead to significant improvements in CDK2 binding whenintroduced at the 6 position of the purine ring. For example,replacement of the benzylamino group of olomoucine at the C6 positionwith 3-chloroaniline resulted in a 10-fold increase in the IC₅₀.Although a variety of hydroxyalkylamino, dihydroxyalkylamino, andcycloalkylamino substituents at the 2 position resulted in moderateimprovements in binding affinity, greater increases were achieved withamino alcohols derived from alanine, valine, and isoleucine. For examplethe R-isopropyl side chain of valinol resulted in a 6.5-fold increaserelative to the hydroxyethyl substituent of olomoucine. In contrast tomany protein kinases that can accommodate larger substituents at the N9of the purine ring, CDK2 binding was strongest for those purines bearingsmall alkyl or hydroxyalkyl substituents. Those substituents thatresulted in the most potent CDK2 inhibition were combined insecond-generation libraries by solution-phase chemistry. The IC₅₀ datafor these series of compounds indicate that the inhibitory effects ofthese substituents are approximately additive.

[0135] Currently, our most potent inhibitor,2-(1R-isopropyl-1-hydroxyethylamino)-6-(3-chloro-4-carboxyanilino)-9-isopropylpurine(purvalanol B, FIG. 1D) has an IC₅₀ against-the complex of CDK2-cyclin Aof 6 nM, which corresponds to a 1000-fold increase over olomoucine and a30-fold increase over flavopiridol (FIG. 1B), one of the most potent andselective CDK2 inhibitors known and currently in human clinical trials(8). Purvalanol B shows a high degree of selectivity: among the 22 humanpurified kinases tested (1, 9), only a subset of the CDKs (cdc2-cyclinB, CDK2-cyclin A, CDK2-cyclin E, CDK5-p35) were significantly inhibited(Table 1). Several close analogs of purvalanol B were also potentinhibitors of cdc2 and CDK2, including the more membrane permeableanalog purvalanol A and compound 52[(2-(2-hydroxyethylamino)-6-(3-chloroanilino)-9-isopropylpurine,IC₅₀=340 nM against cdc2-cyclin B] (FIG. 1E, Table 2). We also assessedthe selectivity of purvalanol A, compound 52, and a N6-methylatedversion of compound 52 (52Me) against four yeast CDKs (10) (Cdc28p,Kin28p, Pho85p, and Srb10p) and the related kinase Cak1p using kinaseassays performed in immunoprecipitates (Table 2) (11). Of the yeastkinases tested, only the cell cycle-regulating kinase Cdc28p and thehighly homologous Pho85p kinase (50% identity to Cdc28p), which isinvolved in phosphate metabolism, were inhibited by purvalanol A and 52.Compound 52Me did not inhibit any of the CDKs tested. TABLE 1 IC₅₀values for purvalanol (purv.) A and B for a variety of purified kinasesKinase Purv. A (IC₅₀ nM) Purv. B (IC₅₀ nM) cdc2-cyclin B 4 6 cdc2-cyclinB (150 μM ATP) 40 50 cdc2-cyclin B (1.5 mM ATP) 500 250 cdk2-cyclin A 706 cdk2-cyclin E 35 9 cdk4-cyclin D1 850 >10,000 cdk5-p35 75 6 erk1 9,0003,333 c-jun NH₂-terminal kinase >1,000 >10,000 Protein kinaseC >10,000 >100,000 Protein kinase C1 >10,000 >100,000 Protein kinaseC2 >10,000 >100,000 Protein kinase C >10,000 >100,000 Protein kinaseC >100,000 >100,000 Protein kinase C >100,000 >100,000 Protein kinaseC >100,000 >100,000 Protein kinase C >100,000 100,000 cAMP-dependentprotein 9,000 3,800 kinase cGMP-dependent protein >10,000 >100,000kinase Casein kinase 1 >3,333 >3,333 GSK3- >10,000 >10,000Insulin-receptor tyrosine 5,000 2,200 kinase

[0136] TABLE 2 IC₅₀ values for 52 and 52Me for immunoprecipitated yeastkinases. Kinases 52 (IC₅₀ μM) 52Me (IC₅₀ μM) Cdc28p 7 >500 Pho85p 2 >500Kin28p >500 >500 Srb10 >500 >500 Cak1p >500 >500

[0137] To explore the structural basis for the selectivity and affinityof these inhibitors we determined the crystal structure of the humanCDK2-purvalanol B complex to 2.05 Å resolution (12) (FIG. 2). Theelectron density map shows that binding of purvalanol B to the CDK2crystals is well ordered except for the 3-chloroanilino group, whichappears to be bound in two alternative conformations (FIG. 2).Purvalanol B fits snugly into the ATP-binding site, as is evident by the86% complementarity between the surface area buried by the inhibitor(364 Å²) compared with the available binding surface in the active siteof the protein (423 Å²). The overall geometry of purvalanol B bound toCDK2 resembles that of the related adenine-substituted inhibitors in theCDK2-olomoucine and CDK2-roscovitine complexes, with the purine ring andits C2, N6, and N9 substituents occupying similar binding pockets. Thepurine ring makes mostly hydrophobic and van der Waals contacts withCDK2 residues. A pair of conserved H bonds are present between the N7imidazole nitrogen and the backbone NH of Leu⁸³, and between the N6amino group and the backbone carbonyl of Leu⁸³; this latter interactionlikely accounts for the greatly reduced inhibitory activity resultingfrom methylation of N6 in compound 52Me. Furthermore, all three2,6,9-trisubstituted adenines form a H bond between the acidic C8 atomof the purine ring and the carbonyl oxygen of Glu⁸¹, an infrequentlyobserved interaction in the crystal structures of nucleic acids andproteins (13).

[0138] The C2 side chain of purvalanol B is bound in the ATPribose-binding pocket (FIG. 2A, structure 3), with the R-isopropyl groupclosely packed against backbone atoms of the glycine-rich loop and thehydroxyl group making a H bond with the backbone carbonyl of Gln¹³³. TheR-isopropyl side chain of purvalanol B leads to a significantrepositioning of the C2 substituent relative to the R-ethyl substituentof roscovitine (FIG. 2A, structure 1), resulting in an open pocket inthe active site lined by the polar side chains of Lys³³, Asn¹³², andAsp¹⁴⁵. In the CDK2-flavopiridol complex, this region is occupied by theN-methylpiperidinyl ring of the inhibitor (FIG. 2A, structure 2),suggesting that further increases in affinity of purvalanol B may resultfrom appending substituents at the C2 position that interact with thissite The 3-chloroanilino group at N6 of purvalanol B points toward theoutside of the ATP-binding pocket, a region not occupied in the CDK2-ATPcomplex. Interactions in this region are likely responsible for theincreased affinity and selectivity of the inhibitors compared with ATPand are evident in the CDK2 complexes of flavopiridol, olomoucine, androscovitine as well. In the CDK2-purvalanol B complex, the3-chloroanilino group of the inhibitor is packed tightly against theside chains of Ile¹⁰ and Phe⁸². Further stabilization of the binding ofthe 3-chloroanilino group comes from a polar interaction between the Cland the side chain of Asp⁸⁶, which appears to be present in abouttwo-thirds of the molecules in the CDK2-purvalanol B crystals. In theother conformation the phenyl ring of the 3-chloroanilino group isflipped ˜160°, suggesting a partially protonated state of Asp⁸⁶. Inaddition to improved packing interactions, the increased bindingaffinity of purvalanol B relative to olomoucine may result from stericconstraints imposed by the purine and chlorinated aniline ring systemsthat limit the number of conformations of the inhibitor. Numeroussubstituents at the 4 position of the aniline ring were tolerated,consistent with the solvent accessibility of this site, which makes thisposition an obvious candidate for altering both the solubility andmembrane permeability. Finally, the N9 isopropyl group of purvalanol Bpacks in a small hydrophobic pocket formed by the side chains of Val¹⁸,Ala³¹, Phe⁸⁰, Leu¹³⁴, and Ala¹⁴⁴, consistent with the narrow range ofsubstituents that can be tolerated at this position.

[0139] To determine the cellular effects of these CDK-directed cellcycle inhibitors, we tested purvalanol A on the NCI panel of 60 humantumor cell lines (leukemia, non-small cell lung cancer, colon cancer,renal cancer, prostate cancer, and breast cancer). Although the averageGI₅₀ (50% growth inhibition) is 2 μM, two cell lines out of the 60showed an ˜20-fold increase in sensitivity to purvalanol A: the KM12colon cancer cell line with a GI₅₀ of 76 nM and the NCI-H522 non-smallcell lung cancer cell line with a GI₅₀ of 347 nM. Fluorescence-activatedcell sorting (FACS) analysis of human lung fibroblast cells treated witha structural analog of purvalanol A,2-(bis-(hydroxylethyl)amino)-6-(4-methoxybenzylamino)-9-isopropylpurine,exhibited both G₁-S and G₂-M inhibitory activity at high concentrationsand predominant G₁-S inhibition at lower concentrations (14).Significant inhibition was also observed in Saccharomyces cerevisiae,where compound 52 inhibited growth in a drug-sensitized yeast strain(15) with a GI₅₀ of 30 μM. In contrast, the closely related compound52Me proved to be a significantly weaker inhibitor of yeast growth(GI₅₀=200 μM) (16).

[0140] Conclusion

[0141] It is to be understood that the above description is intended tobe illustrative and not restrictive. Many variations of the inventionwill be apparent to those of skill in the art upon reviewing the abovedescription By way of example, the invention has been describedprimarily with reference to the use of a high density oligonucleotidearray, but it will be readily recognized by those of skill in the artthat other nucleic acid arrays, other methods of measuring transcriptlevels and gene expression monitoring at the protein level could beused. The scope of the invention should, therefore, be determined notwith reference to the above description, but should instead bedetermined with reference to the appended claims, along with the fullscope of equivalents to which such claims are entitled.

[0142] References and Notes

[0143] 1. J. Vesely, et al., Eur. J. Biochem. 224, 771 (1994); L.Meijer, et al., ibid. 243, 527 (1997) .

[0144] 2. N. S. Gray, S. Kwon, P. G. Schultz, Tetrahedron Lett. 38, 1161(1997)

[0145] 3. T. C. Norman, N. S. Gray, J. T. Koh, J. Am. Chem. Soc. 118,7430 (1996).

[0146] 4. O. Mitsonobu, Synthesis 1, 1 (1981).

[0147] 5. A. Toyota, N. Katagiri, C. Kaneko, Heterocycles 36, 1625(1993).

[0148] 6. B. J. Backes and J. A. Ellman, J. Am. Chem. Soc. 116, 11171(1994)

[0149] 7. H. M. Geysen, S. J. Rodda, T. J. Mason, G. Tribbick, P. G. J.Schoofs, Immunol. Methods 102, 259 (1987).

[0150] 8. L. Meijer, Trends in Cell Biol. 6, 393 (1996).

[0151] 9. Starfish is the major source for cdc2-cyclin B kinase. Therecombinant human cdc2-cyclin B is likely to contain inactive monomersand dimers that would interfere with CDK inhibition assays [see (11)].

[0152] 10. D. O. Morgan, Annu. Rev. Cell Dev. Biol. 13, 261 (1997).

[0153] 11. Supplemental material is available athttp://www.sciencemag.org/feature/data/976815.shl.

[0154] 12. Crystallography statistics for the CDK-purvalanol B complex.Data: space group, P212121, cell constants a=53.55 Å, b=71.35 Å, c=72.00Å, resolution 32 to 2.05 Å. Number of unique reflections=17655,completeness=98.7% (91.6 from 2.11 to 2.05 Å),R_(merge)=_(hi)I_(hi)I_(h)/_(hi)I_(hi)=5.5%, where h are uniquereflections indices and i indicate symmetry equivalent indices.Refinement calculations: R_(factor)=(F_(o)F_(c))/F_(o)=18.8%, whereF_(o) and F_(c) are the observed and calculated structure factors,respectively; R_(free)=26.4% (same calculation as for R_(factor), butwith 5% of the data); average atomic B values for protein: 31.4 Å²,inhibitor=32.2 Å², waters=37.7 Å². Observed deviations: root meanssquare (rms) bond lengths=0:008 Å; rms bond angles=1.31°. The finalmodel includes 279 residues of CDK2 (residues 36 to 43 and 153 to 163are not included because of weak or missing electron density),purvalanol B, 91 water molecules, and one molecule of ethyleneglycol.Efforts to crystallize the CDK2-purvalanol A complex resulted crystalsof poor quality.

[0155] 13. M. C. Wahl and M. Sundaralingam, Trends Biochem. Sci. 22, 97.(1997).

[0156] 14. E. E. Brooks, et al., J. Biol. Chem. 272, 29207 (1997).

[0157] 15. Because of weak inhibition of yeast growth by flavopiridol weused a strain with three drug-sensitizing deletions (erg6,pdr5, snq2).This strain showed GI₅₀ for 52 and flavopiridol at concentrations of 30and 7 μM, respectively. Three cultures [110 ml, in yeast extract,peptone, and dextrose (YPD)] were inoculated with single colonies ofYRP1 (MAT, erg6::LEU2, pdr5::TRP1, snq2::HIS6) and grown at 30° C. withconstant agitation in a water bath incubator. When the cell densityreached an optical density (OD) of 0.9 (at a wavelength of 600 nm), 27.5μl of a 100 mM dimethyl sulfoxide (DMSO) stock solution of 52 orflavopiridol or DMSO alone was added. After 2 hours the cells wereharvested by centrifugation and flash frozen with liquid nitrogen. Forthe temperature-sensitive cdc28 mutants, three cultures (75 ml, YPD) ofAFS199 (cdc28-13), AA104 (cdc28-4), and their isogenic background strainAFS34 (MATa, ade2-1, his3-11, leu2-3, trp1-1, ura3) were grown fromsingle colonies to an OD of 0.9 (600 nm) and harvested as described.Frozen cells were stored at 80° C.

[0158] 16. The diminished growth inhibitory activity of compound 52Me isunlikely to result from poorer bioavailability because a similar N6methylation is observed to increase the in vivo potency of a relatedseries of purine-based inhibitors. The residual growth inhibitoryactivity of 52Me likely reflects activity against other cellulartargets. Compounds 52,52Me, and flavopiridol failed to cause a uniformarrest morphology in yeast. FACS analysis also did not revealsynchronization of yeast cells after treatment with 52 or 52Me, whichmay be due to inhibition of a variety CDKs responsible for differentcell cycle transitions (as is observed in FACS experiments on mammaliancells) or activity against other targets not specifically examined invitro.

[0159] 17. D. J. Lockhart et al., Nature Biotechnol. 14, 1675(1996).

[0160] 18. L. Wodicka, H. Dong, M. Mittmann, M. -H. Ho, D. J. Lockhart,ibid. 15, 1359 (1997).

[0161] 19. J. L. DeRisi, V. R. Iyer, P. O. Brown, Science 278, 680(1997).

[0162] 20. S. P. A.. Fodor, et al., ibid. 251, 767 (1991.

[0163] 21. Transcripts that showed a significant and reproducible changein concentration (greater than twofold) in cells treated with thecompounds between triplicate hybridizations for each of at least twoindependent experiments were examined further.

[0164] 22. F. R. Cross, Curr. Opion. Cell Biol. 7, 790 (1995)

[0165] 23. A. J. Van Wijnen, et al., Proc. Natl. Acad. Sci. U.S.A. 91,12882 (1994).

[0166] 24. E. M. Lenburg and E. K. O'Shea, Trends Biol Sci. V21 383(1996).

[0167] 25. L. W. Bergman and B. K. Timblin, Mol. Microbiol. 26, 981(1997).

[0168] 26. L. W. Bergman, K. Tatchell, B. K. Timblin, Genetics 143, 57(1996).

[0169] 27. Because the PHO85 gene is nonessential it should be possibleto determine if these inductions are a direct consequence of Pho85pkinase inhibition by determining if the same inductions are seen aftertreating with inhibitor in a strain lacking the kinase.

[0170] 28. B. Andrews, et al., Mol Cell Biol. 17, 1212 (1997).

[0171] 29. P. Mazur, et al., ibid. 15, 5671 (1995).

[0172] 30. Transcript profiles were also measured for the cdc28temperature-sensitive allele cdc28-13. The cdc28-13 strain contains anarginine to asparagine mutation at residue 283 near the COOH-terminus,which does not significantly affect kinase activity at the permissivetemperature but does cause cell cycle arrest when switched to thenonpermissive temperature (32). The cdc28-13 strain showed very fewchanges in mRNA transcripts when compared with wild type at thepermissive temperature. The levels of only 11 mRNAs changed by more thantwofold, consistent with the observation that this mutant hasessentially wild-type kinase activity at. 25° C. In addition, the nearlyidentical gene expression patterns obtained for the cdc28-13 andisogenic wild-type CDC28 strain demonstrate the reproducibility of theseexperiments.

[0173] 31. The cdc28-4 allele that exhibits a START (G₁-S) defect or anallele such as cdc28-1N, which has a G₂-M defect could also serve as amimic of CDK inhibition by 52 or flavopiridol.

[0174] 32. S. I. Reed, J. A. Hadwiger, A. T. Lorinez, Proc. Natl. Acad.Sci. U.S.A. 82, 4055 (1985).

[0175] 33. C. Koch and K. Nasmyth, Curr. Opion. Cell Biol 6, 451 (1994).

[0176] 34. M. Russell, J Bradshaw-Rouse, D. Markwardt, W. Heideman, Mol.Biol. Cell 4, 757 (1993).

[0177] 35. H. Ruis and C. Schuller, BioEssays 17, 959 (1995).

[0178] 36. J. R. Woodgett, et al., Trends Biochem. Sci. 16, 177 (1991).

[0179] 37. U.S. patent application 1368.002

[0180] 38. For example, using a screen of our purine libraries, to bedescribed elsewhere, we have identified a compound that causes extensivedepolymerization of microtubules and condensation of DNA.

[0181] 39. J. Rine, W. Hansen, E. Hardeman, R. W. Davis, Proc. Natl.Acad. Sci. U.S.A. 80, 6750 (1983); M. Schena, D. Shalon, R. W. Davis, P.O. Brown, Science 270, 467 (1995). TABLE 3 Effects of C52 + FLA + MUT4 1. YBR214W (Similar to S. pombe protein involved in meiosis + mitosis)INCREASES (4-4.5 × for C52 7-8.5 × for FLA 2-2.5 × MUT 4) 2. YBL002WHTB2 Histope DECREASES (˜4.5 × for C52 ˜2.7 × for FLA 1.9 × for MUT 4)3. YALO61W alcohol/sorbitol dehydrogenase INCREASES 9-11 × with C52 &FLA ˜6.1 × MUT 4 (not detected in wt) 4. YKRO97W PCK1 Phosphoenolpyruvate carboxykinase INCREASES: ˜7-8 fold with C52 ˜4 × with FLA ˜3.5× MUT 4 C52 UNIQUE 1. YKLO71W unknown fund. 7-12 × Increase (C52) FLAUNIQUE 1. YCRX13W SOL2 7.5-10 × INCREASE * ALSO MUT 4 2.3-2.7 × INCC52-much smaller inc. (NL. 8 × at most) MUT 4 specific YOR202W H153 2-4X INC CHANGES COMMON TO C52 AND FLA (Not mutants) 1 YGR108 W (CLB1) =G2/M phase specific cyclin DECREASES (˜8 fold with C52 ˜2.5-3 fold withFLA) 2. YNL327 W (EGT2) involved in cell separation DECREASES 2-3 fold(both drugs) 3. YBR114W (RAD16) Nucleotide G2 repair INCREASES 4-5 Xboth drugs 4. YDR247W Serine/Threonine Kinase similar to S. pombe RAN 13-3.5 × INCREASE both drugs 5. HXT5 Homologous to hexose transportersINCREASES Not detected in untreated (˜12-14 × increase with drugs) 6.YGRO43 similar to Talp a transddolase INCREASES ˜4-7.5 × with drugs 7.YGL179 INCREASES C52 & FLA 8. YBR296C VERY LARGE INCREASE (15-25 ×) 9.YLR178c (TFS1) INCREASES ˜6× 10. YDR281 C C52 ˜4.5-5.5 × INCREASE MUT 4:3.5-5.5 × DECREASE

We claim:
 1. A set of at least two probes, wherein each of said probescomprises a segment of the nucleotide sequence of a gene which isdown-regulated in response to both compound 52 and flavopiridol.
 2. Aset of at least two probes, wherein each of said probes comprises asegment of the nucleotide sequence of a gene which is up-regulated inresponse to both compound 52 and flavopiridol.
 3. A set of at least twoprobes, wherein each of said probes comprises a segment of thenucleotide sequence of a gene which is down-regulated in response tocompound 52 but not to flavopiridol.
 4. A set of at least two probes,wherein each of said probes comprises a segment of the nucleotidesequence of a gene which is up-regulated in response to compound 52 butnot to flavopiridol.
 5. A set of at least two probes, wherein each ofsaid probes comprises a segment of the nucleotide sequence of a genewhich is down-regulated in response to flavopiridol but not to compound52.
 6. A set of at least two probes, wherein each of said probescomprises a segment of the nucleotide sequence of a gene which isup-regulated in response to flavopiridol but not to compound
 52. 7. Theset of claim 1 wherein-the gene is also down-regulated in cdc28-4mutants.
 8. The set of claim 2 wherein the gene is also up-regulated incdc28-4 mutants.
 9. The set of claim 1 wherein the gene is up-regulatedin cdc28-4 mutants.
 10. The set of claim 2 wherein the gene isdown-regulated in cdc28-4 mutants.
 11. A set of at least two probes,wherein each of said probes comprises a segment of the nucleotidesequence of a gene which is down-regulated in cdc28-4 mutants.
 12. A setof at least two probes, wherein each of said probes comprises a segmentof the nucleotide sequence of a gene which is down-regulated orup-regulated in response to both compound 52 and flavopiridol.
 13. A setof at least two probes, wherein each of said probes comprises a segmentof the nucleotide sequence of a gene which is up-regulated ordown-regulated in cdc28-4 mutants.
 14. The set of claim 13 wherein theup-regulation or down-regulation is at least two fold as compared towild-type.
 15. A set of at least two probes, wherein each of said probescomprises a segment of the nucleotide sequence of a gene which isup-regulated in cdc28-4 mutants.
 16. The set of claim 13 wherein each ofsaid probes comprises a segment of the nucleotide sequence of a genewhich is not up-regulated or down-regulated in response to compound 52or flavopiridol.
 17. The set of claim 13 wherein each of said probescomprises a segment of the nucleotide sequence of a gene which is notup-regulated or down-regulated in response to compound
 52. 18. The setof claim 13 wherein each of said probes comprises a segment of thenucleotide sequence of a gene which is not up-regulated ordown-regulated in response to flavopiridol.
 19. The set of claim 13wherein each of said probes comprises a segment of the nucleotidesequence of a gene which is up-regulated or down-regulated in responseto compound
 52. 20. The set of claim 13 wherein each of said probescomprises a segment of the nucleotide sequence of a gene which isup-regulated or down-regulated in response to flavopiridol.
 21. The setof claim 13 wherein each of said probes comprises a segment of thenucleotide sequence of a gene which is up-regulated or down-regulated inresponse to both flavopiridol and compound
 52. 22. The set of any of thepreceding claims wherein the genes are yeast genes.
 23. The set of claim22 wherein the yeast is Saccharomyces cerevisiae.
 24. The set of any ofthe preceding claims wherein the genes are human genes.
 25. The set ofany of the preceding claims wherein the probes are immobilized on asolid support.
 26. The set of any of the preceding claims wherein theprobes are immobilized on an array.
 27. The set of any of the precedingclaims wherein up-regulation or down-regulation is determined by adifference of at least three-fold from a control.
 28. The set of any ofthe preceding claims which comprises at least 3 probes.
 29. The set ofany of the preceding claims which comprises at least 5 probes.
 30. Theset of any of the preceding claims which comprises at least 7 probes.31. The set of any of the preceding claims which comprises at least 9probes.
 32. The set of any of the preceding claims which comprises atleast 11 probes.
 33. The set of any of the preceding claims whichcomprises at least 20 probes.
 34. The set of any of the preceding claimswhich comprises at least 30 probes.
 35. The set of any of claims 1-31which consists of less than 10 probes.
 36. The set of any of claims 1-32which comprises less than 20 probes.
 37. The set of any of claims 1-33which consists of less than 30 probes.
 38. The set of any of thepreceding claims which consists of less than 100 probes.
 39. The set ofany of the preceding claims which consists of less than 1000 probes. 40.The set of any of the preceding claims which comprises less than 10000probes.
 41. The set of claim 25 wherein at least 10% of probes on thesolid support comprise segments of genes whose regulation is affected bycompound 52, flavopiridol, or cdc28-4.
 42. The set of claim 25 whereinat least 20% of probes on the solid support comprise segments of geneswhose regulation is affected by compound 52, flavopiridol, or cdc28-4.43. The set of claim 25 wherein at least 40% of probes on the solidsupport comprise segments of genes whose regulation is affected bycompound 52, flavopiridol, or cdc28-4.
 44. The set of claim 25 whereinat least 60% of probes on the solid support comprise segments of geneswhose regulation is affected by compound 52, flavopiridol, or cdc28-4.45. The set of claim 25 wherein at least 80% of probes on the solidsupport comprise segments of genes whose regulation is affected bycompound 52, flavopiridol, or cdc28-4.
 46. The set of claim 25 whereinat least 90% of probes on the solid support comprise segments of geneswhose regulation is affected by compound 52, flavopiridol, or cdc28-4.47. The set of claim 26 wherein at least 10% of probes on the arraycomprise segments of genes whose regulation is affected by compound 52,flavopiridol, or cdc28-4.
 48. The set of claim 26 wherein at least 20%of probes on the array comprise segments of genes whose regulation isaffected by compound 52, flavopiridol, or cdc28-4.
 49. The set of claim26 wherein at least 40% of probes on the array comprise segments ofgenes whose regulation is affected by compound 52, flavopiridol, orcdc28-4.
 50. The set of claim 26 wherein at least 60% of probes on thearray comprise segments of genes whose regulation is affected bycompound 52, flavopiridol, or cdc28-4.
 51. The set of claim 26 whereinat least 80% of probes on the array comprise segments of genes whoseregulation is affected by compound 52, flavopiridol, or cdc28-4.
 52. Theset of claim 26 wherein at least 90% of probes on the array comprisesegments of genes whose regulation is affected by compound 52,flavopiridol, or cdc28-4.
 53. A method of comparing the specificity ofdrugs, comprising: contacting a first drug with a first population ofcells and a second drug with a second population of said cells;preparing a transcription indicator from each of the first and thesecond populations of cells, wherein a transcription indicator isselected from the group consisting of cellular RNA, cellular mRNA, cRNAand cDNA; preparing a transcription indicator from a third population ofsaid cells which is not contacted with a drug; hybridizing thetranscription indicators to oligonucleotide arrays to form a pattern ofhybridization for each of said populations of cells; comparing each ofthe first and the second populations' patterns of hybridization to thethird population's pattern of hybridization to identify changes inducedby the first and the second drugs; comparing changes induced by thefirst and second drugs, wherein a drug which effects more changes isless specific than a drug which effects fewer changes.
 54. The method ofclaim 53 wherein the first drug is flavopiridol.
 55. The method of claim53 wherein the first drug is compound
 52. 56. The method of claim 53wherein the cells are yeast cells.
 57. The method of claim 53 whereinthe first and second drugs affect a common target protein.
 58. Themethod of claim 53 wherein the cells are mammalian cells.
 59. The methodof claim 53 wherein the array comprises at least 1000 oligonucleotidesof distinct sequence.
 60. The method of claim 53 wherein the arraycomprises at least 6000 oligonucleotides of distinct sequence.
 61. Themethod of claim 53 wherein the first drug has a known beneficial effectand the second drug is identified as useful if it induces a similarpattern of changes.
 62. The method of claim 53 wherein the drug is akinase inhibitor.
 63. A method of comparing the effects of a drug to theeffects of a mutation, comprising: contacting a drug with a firstpopulation of cells; preparing a transcription indicator from the firstpopulation of cells, wherein a transcription indicator is selected fromthe group consisting of cellular RNA, cellular mRNA, cRNA and cDNA;preparing a transcription indicator from a second population of cellswhich population is not contacted with a drug, wherein the secondpopulation of cells carry a mutation in a gene of interest relative tothe first population of cells; preparing a transcription indicator froma third population of cells which is not contacted with a drug and whichdoes not carry the mutation; hybridizing the transcription indicators tooligonucleotide arrays; to form a pattern of hybridization for each ofsaid populations of cells; comparing each of the first and the secondpopulations' patterns of hybridization to the third population's patternof hybridization to identify changes caused by the drug and themutation; comparing the changes caused by the drug to those caused bythe mutation; wherein a drug and a mutation which affect hybridizationto one or more common oligonucleotides identifies the gene of interestas a candidate target of the drug; wherein a drug which affectshybridization to both common oligonucleotides and uniqueoligonucleotides identifies the drug as affecting targets other than thegene.
 64. The method of claim 63 wherein the drug is flavopiridol. 65.The method of claim 63 wherein the drug is compound
 52. 66. The methodof claim 63 wherein the mutation is in a kinase.
 67. The method of claim63 wherein the mutaion is in the CDC28 gene.
 68. The method of claim 63wherein the cells are yeast cells.
 69. The method of claim 63 whereinthe cells are mammalian cells.
 70. The method of claim 63 wherein thearray comprises at least 1000 oligonucleotides of distinct sequence. 71.The method of claim 63 wherein the array comprises at least 6000oligonucleotides of distinct sequence.
 72. A method of comparing thespecificity of drugs, comprising: comparing changes in expressioninduced by a first drug to those induced by a second drug, wherein adrug which effects more changes is less specific than a drug whicheffects fewer changes, wherein the changes are determined the processof: contacting the first drug with a first population of cells and thesecond drug with a second population of said cells; preparing atranscription indicator from each of the first and the secondpopulations of cells, wherein a transcription indicator is selected fromthe group consisting of cellular RNA, cellular mRNA, cRNA and cDNA;preparing a transcription indicator from a third population of saidcells which is not contacted with a drug; hybridizing the transcriptionindicators to oligonucleotide arrays to form a pattern of hybridizationfor each of said populations of cells; and comparing a first and asecond populations' patterns of hybridization to a third population'spattern of hybridization to identify changes induced by the first andthe second drugs.
 73. The method of claim 72 wherein the drug isflavopiridol
 74. The method of claim 72 wherein the drug is compound 52.75. A method of comparing the effects of a drug to the effects of amutation, comprising: comparing changes in expression caused by a drugto those caused by a mutation, wherein the changes in expression aredetermined by the process of: contacting the drug with a firstpopulation of cells; preparing a transcription indicator from the firstpopulation of cells, wherein a transcription indicator is selected fromthe group consisting of cellular RNA, cellular mRNA, cRNA and cDNA;preparing a transcription indicator from a second population of cellswhich population is not contacted with a drug, wherein the secondpopulation of cells carry the mutation in a gene of interest relative tothe first population of cells; preparing a transcription indicator froma third population of cells which is not contacted with a drug and whichdoes not carry the mutation; hybridizing the transcription indicators tooligonucleotide arrays to form a pattern of hybridization for each ofsaid populations of cells; comparing each of the first and the secondpopulations' patterns of hybridization to the third population's patternof hybridization to identify changes caused by the drug and themutation; wherein a drug and a mutation which affect hybridization toone or more common oligonucleotides identifies the gene of interest as acandidate target of the drug; wherein a drug which affects hybridizationto both common oligonucleotides and unique oligonucleotides identifiesthe drug as affecting targets other than the gene.
 76. The method ofclaim 75 wherein the drug is flavopiridol.
 77. The method of claim 75wherein the drug is compound
 52. 78. The method of claim 75 wherein themutation is in a kinase.
 79. The method of claim 75 wherein the mutaionis in the CDC28 gene.